Wrangling very large Deep Zoom images

My current project features a large Deep Zoom image. Very large. About 123 Gigapixels, in fact. Here’s the raw numbers:

  • 20 tile levels (0-19)
  • about 20GB of jpeg images
  • about 2.5 million tiles
  • almost 1.9 million tiles just in the highest level of zoom

First of all, what’s impressive is that Silverlight doesn’t even break a sweat with this image. Deep Zoom is designed such that it almost doesn’t matter how big the raw image is, it only matters how many pixels you’ve got on your screen, because it will always show you the resolution and section of the picture that you want. The main constraint is how fast you can serve (and download) the tiles.

But such a large image and tile set brings some interesting problems, and I wanted to post a bit about some of the issues we’ve found, and how we’ve addressed them.

Lots of files in one place are bad

Although not cripplingly so…

As mentioned earlier, our top level of tiles (the native resolution level) contained almost 1.9 million tiles. This is how Deep Zoom composer arranges its output and how Silverlight’s MultiScaleImage expects to find them. And it still works. Although one thing we learned early on is that FAT32 can’t cope with that number of files. We use removable hard drives for convenience, and I would always forget to reformat them to NTFS, then spend hours trying to build our image, only to fail because it ran out of directory entries. Grr.

But Fat32 aside, serving the images from NTFS on these tiny hard drives was very smooth. It didn’t seem to worry that there were so many files. The problem comes when copying the files, particularly to a network server. The copy would always start out fairly nippy, but when it got to the larger directories, it would just grind to a halt, probably because it was doing a directory enumeration each time over the network. It was definitely slowing down over time.

The solution I decided to use was to rearrange the files. I decided to take the X index of the filename and use that to generate a subdirectory in which to store the file. This would mean that, for example, if I had a file 142_1232.jpg in level 19, it would move from 19\142_1232.jpg to 19\1\4\2\142_1232.jpg. This method means that the maximum number of files in any single directory would now depend on the height of the image, rather than the area – in our specific case limiting us to 1718 files maximum.

Now, this is all very well, but the MultiScaleImage expects the files in their original places, so how do we fix that problem. Two ways would work. The first would be to use URL rewriting on the server side (mod_rewrite on Apache, for example) which would work OK, but we’d need two different solutions depending whether it’s Apache or WIndows serving the files. The second way is to write a custom tile source for MultiScaleImage.

Advantages of a Custom TileSource

The main advantage is that we don’t care where the images are hosted – they remain just files in a directory structure, and can be served fast by the web server. Also, avoiding mod_rewrite etc. might be useful as that’s more work for the servers to do that could be better performed by the client. It also means that the same solution works whether we’re serving locally (for testing) or serving from a dev server (which would be running Apache, probably).

Problems with a Custom TileSource

Although it’s easy enough to inherit from the base MultiScaleTileSource class, they’ve not made it very easy to do exactly what you need to do. I wanted to be able to create a class which had a constructor which took a Uri, and which read from an Xml file to get information about size etc. But this seems to be impossible. MultiScaleTileSource expects you to know already the dimensions of your image when the constructor is called, and doesn’t allow any way to initialise these values after the constructor is called, due to the protection level of other members. And since Silverlight doesn’t have a synchronous way to read from a file, you can’t open the file in the constructor. Annoying.

In the end, I cheated, because I know how big my image is already. I’ve already got code which takes the Uri of the image from the host HTML page, so I adjusted that to take the path to the new files, along with the width and height, as parameters which I can then pass to my new constructor, bypassing the need to read from a file. It’s not ideal, and I hope that this process is opened up a little in future.

Once I’d arrived at this way of initialising my class, writing the override method to return the tile paths was a little easier, although again I had to hard-code some specific information about my particular image – My image has a virtual square shape, but is actually rectangular, so there are lots of virtual tiles which don’t actually exist. DeepZoomImageTileSource handles this with the information in the xml file (it’s all part of the ‘sparse’ nature of deep zoom images) but I just hard-coded the limits of my tileset.

This solution works well for us. I haven’t tested it to see if it makes a difference with serving the images, but it definitely drastically reduced the time it takes to deploy our images to a server – from something that had already taken days and was slowing down, to something that completed within four hours. So that was a win for us.

I’m not sure how useful this code would be, but I’m including it anyway, for illustration.

public class HashedDeepZoomTileSource : MultiScaleTileSource
{
    private string RootPath;

    public HashedDeepZoomTileSource(string root, int imageWidth, int imageHeight) : base(imageWidth,imageHeight,256,256,1)
    {
        RootPath = root;

    }

    ///

    /// Constructs a tilesource given a string describing the root of the image
    /// And the image size.
    ///

    ///string containing the root directory (relative to XAP file) and the
    /// width and height packed in the following form:
    /// root|width|height
    ///
    /// so an image with a root directory GeneratedImages/uk3_files and width of 232000
    /// and a height of 445000 would have a string:
    ///
    /// GeneratedImages/uk3_files|232000|445000
    ///
    /// If the path is a regular path, then we simply construct a normal DeepZoomImageTileSource.
    ///     /// Either A HashedDeepZoomTileSource object or a normal DeepZoomImageTileSource
    public static MultiScaleTileSource UnpackPath(string packedRoot)
    {
        string[] parts = packedRoot.Split(‘|’);
        int width;
        int height;
        if (parts.Length != 3 || !int.TryParse(parts[1], out width) || !int.TryParse(parts[2], out height))
        {
            return new DeepZoomImageTileSource(new Uri(packedRoot, UriKind.Relative));
        }
        return new HashedDeepZoomTileSource(parts[0], width, height);
    }

    ///

    /// This is a hack, It (and maxheights) describes the maximum tile ID available
    /// at all the tile heights in the image.
    ///

    int[] maxwidths =
    {
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        1,
        2,
        4,
        8,
        17,
        34,
        68,
        136,
        273,
        546,
        1093

    };

    int[] maxheights =
    {
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        1,
        3,
        6,
        13,
        26,
        53,
        107,
        214,
        429,
        859,
        1718
    };

    ///

    /// Overrides the GetTileLayers method to provide a URL for the given tile source.
    /// We have to create a fully qualified domain path – it doesn’t like relative
    /// paths.
    ///

    ///Which level of tile resolution (0-19 e.g.)     ///X position of required tile     ///Y Position of required tile     ///List to populate with Uris pointing at the tiles they want     protected override void GetTileLayers(int tileLevel, int tilePositionX, int tilePositionY, System.Collections.Generic.IList tileImageLayerSources)
    {
        if (tileLevel >= 0)
        {
            // MASSIVE KLUDGE
            // Since our map is ‘sparse’ in that its logical size if 480000 square
            // but we only have tiles for a width of 280000 so we have to not return
            // non-existent URLs.
            // We look up the maximum X position value from the maxwidths

            if (tilePositionX <= maxwidths[tileLevel] && tilePositionY <= maxheights[tileLevel])             {                 StringBuilder path = new StringBuilder(RootPath);                 if (RootPath.EndsWith("/") == false)                 {                     path.Append("/");                 }                 path.AppendFormat("{0}/", tileLevel);                 foreach (char digit in tilePositionX.ToString().ToCharArray())                 {                     path.AppendFormat("{0}/", digit);                 }                 path.AppendFormat("{0}_{1}.jpg", tilePositionX, tilePositionY);                 string s = App.Current.Host.Source.ToString();                 s = s.Substring(0, s.LastIndexOf('/') + 1);                 tileImageLayerSources.Add(new Uri(s + path.ToString(), UriKind.Absolute));             }         }     } } [/sourcecode]

Advertisements

9 comments

  1. We are running into strange behaviors with Large TIF files as well. Im not sure I follow the code, or the DLL that the DZC team has now released. Any pointers to get started with it. Is there a limitation to the size of the TIF we can load into the DZC?

    Thanks!

    1. I’m not sure about specific limits, but I’ve definitely had large images which won’t load or process successfully in DZC. I had a series of TIFFs around 500MB, and a couple of those simply caused DZC to crash out.

      If you can, perhaps running DZC on a 64bit Windows installation with >4G RAM might be a help for large images, although I have no experience of that. Failing that, do you have tools which can split your TIFF into smaller parts? Sometimes DZC can handle a large image if it’s composed of several smaller images.

  2. If you want to stick with the original Deep Zoom format, then make sure you:

    1. Turn off antivirus for the duration of the copy

    2. Use Vista SP1 & Server 2008 or later so you get the best file I/O semantics

    3. Use the /MT option on ROBOCOPY to copy the files. This is available with Server 2008 R2 and Win 7, but runs on earlier versions.

    You can also change the format:

    4. Raise the tile size so there are fewer files at each level.

    5. Use the SmoothStreaming option on DeepZoomTools to generate your content and in Silverlight 3 to view it. You or your CDN can host the project on an IIS7 server with the smooth streaming option. In this case, there is only one file per level.

    Looking forward to seeing your project!

    1. I’d missed the SmoothStreaming option, although that wouldn’t have helped us in our case, because our infrastructure group won’t run Windows servers.

      The project *still* isn’t public, but I’m hoping something will be available at the end of the month. I’ll definitely be posting here to announce it.

  3. You can host the tiles in the Windows Azure blob storage. Make the container public and the images are directly accessable.
    Now with the CDN these would be served from 18 differenet locations around the world.
    The trick with deepzoom in Silverlight is you have to serve the tiles really fast or it slows the interface right down waiting for older tile requests.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s