Pipeline

Build a pipeline that generates DZI files for you, as well as a data file containing information about all of those files

Get the project source code below, and follow along with the lesson material.

Download Project Source Code

To set up the project on your local machine, please follow the directions provided in the README.md file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.

This lesson preview is part of the OpenSeadragon Deep Dive course and can be unlocked immediately with a \newline Pro subscription or a single-time purchase. Already have access to this course? Log in here.

This video is available to students only

Unlock This Course

Get unlimited access to OpenSeadragon Deep Dive, plus 70+ \newline books, guides and courses with the \newline Pro subscription.

$Thumbnail for the \newline course OpenSeadragon Deep Dive$

[00:00 - 00:10] So far, we've just been converting files directly in the script folder, but for a real project, we'll want things to be more organized. Also, if you have dozens of images, you probably don't want to have to be downloading them all manually.

[00:11 - 00:20] Furthermore, you may have additional information about the images that you want to have access to in your app. In this lesson, we'll upgrade our script to support all these needs.

[00:21 - 00:33] I've gone to Wikimedia Commons and found a number of large images for us to work with. I've stored them in a JSON file called image-source-data.json, with each image being an object inside the images array.

[00:34 - 00:45] Each image has an imageUrl, the actual image file, and a pageUrl, the web page that gives more information about that image. We'll want to convert the image, but also store the pageUrl so that we can use it in our web app.

[00:46 - 00:58] Keep in mind that, together, these images take up a lot of space, 1.16 GB for the original images and 650 MB for the DZIs. If you don't have much free space on your hard drive, you'll want to cut down this list before working with it.

[00:59 - 01:16] This file, with the name starting with Lambda, is the biggest, 453 MB by itself, so it would be the first one to drop. Note that the DZI files are actually smaller than the original images in this case, even though the DZI files make up a fully tiled pyramid and therefore have multiple copies of the image at different sizes.

[01:17 - 01:37] This is because the default vips dzsave image compression is evidently greater than the original image compression. If we have the same image compression all around, the DZI should theoretically take up about 1 1/3 times the amount of space as the original, because the DZI includes the original plus 1/4 size version plus 1/4 of that, etc.

[01:38 - 01:51] The most dramatic case here is the Lambda file, which is being converted from a 453 MB PNG to a series of JPEG tiles adding up to 138 MB. This is because JPEG naturally compresses more than PNG in most cases.

[01:52 - 01:59] Okay, it's time to update our script. We're starting with the script from the previous module. The first step is to load in the JSON data, like so.

[02:00 - 02:13] We're using the built-in fs module, which is Node's collection of file system functions. We'll need to require it. Before, we worked with images we had downloaded manually, but now let's automate the downloading process based on the imageUrls in our data file.

[02:14 - 02:21] To keep things organized, let's download them into a folder. First, we specify the download location using Node's path module.

[02:22 - 02:35] We can now make our loop run through the images from the JSON, and use the download library to download them. Of course, we'll need to require download, but in this case we also need to install it since it's not built into Node.

[02:36 - 02:47] Add a quick package.json to your scripts folder, listing download as a dependency. The npm command line tool would love for you to have more details in your package.json, but for our purposes, this is all you need.

[02:48 - 02:53] Go to the scripts folder in your terminal and run the install. You should be able to run the script now and have it download the files.

[02:54 - 03:01] You'll need to comment out the conversion step for now, since we haven't yet updated that. You can run it in the terminal like so, assuming it's called pipeline.js.

[03:02 - 03:09] Keep in mind that it may take a while depending on your internet connection speed. Okay, let's step back a moment and get ourselves some names for things.

[03:10 - 03:21] We've just been using fileName, but that's not really accurate anymore since it's a URL. Furthermore, we'll want to use the whole URL for the download, but then just parts of the URL with a resulting file.

[03:22 - 03:28] For instance, the URL for one of the images is this. But for the file that results from the download, we just want this.

[03:29 - 03:36] And then the DZI file, we want to name this. To help us with that, we can use the url Node module.

[03:37 - 03:45] Inside our for loop, we can parse the imageUrl and get the file name without the rest of the URL with its extension. And without its extension.

[03:46 - 03:56] Note that we're also decoding the special URI codes like %2c for comma. We do this because the server needs to have unencoded file names to match the encoding in the URLs.

[03:57 - 04:09] For instance, when we ask for this, the server will look for this. We could of course get around this by not having file names with special characters, but this is the sort of real world data you'll have to deal with, so you might as well work with it.

[04:10 - 04:25] In our app, we'll also want a name to display for the image. While we could include a hand-picked name for each image in our image-source- data.json, we can just clean up the file name a little for now by using the decoded URL and converting underscores to spaces like so.

[04:26 - 04:40] Now that we have our prettyName, we can retire the old fileName and use prettyName for all of our logging and just use data.imageUrl for the download . For the conversion, we need to know where we are finding the image, where it was downloaded, and where we want it to be converted to.

[04:41 - 04:46] To keep things organized, let's have a DZI folder, creating it if necessary. And then use that folder.

[04:47 - 04:56] Note that convertSrc includes the file extension, since that will be part of the downloaded file, but convertDsc does not, because vips dzsave automatically adds the data to the file. Automatically adds the .dzi.

[04:57 - 05:01] Okay, with that info, we can now do our conversion. If you want, you can give that a try.

[05:02 - 05:11] Since you've already downloaded the files from the previous step, you can comment out the download portion for now. Now that we have all of our images converted, we also want to save all the info we have about them.

[05:12 - 05:19] We'll save that out as a JSON named image-data.json for use in our client-side code. We'll need an array to collect the image info in.

[05:20 - 05:28] Since it's going to be saved as a JSON, we'll make it a property of an object. Each time, after converting an image, we can push that image's info into the array.

[05:29 - 05:33] And then, at the end, we can write out the JSON. The resulting file should look like so.

[05:34 - 05:42] Notice that the imageUrl properties in this new file point to our converted images rather than the originals on the web. Notice also that we have our cleaned-up named properties.

[05:43 - 05:49] The pageUrl is identical to what came in from image-source-data.json. Go ahead and give it a run.

[05:50 - 06:00] You've taken a list of images on the web, downloaded them, converted them to DZI and created a data file with information about all of them. The next step is to use them in your OpenSeadragon app.

[06:01 - 06:06] We'll tackle that in the next module. But first, see the next lesson for some exercises to practice what you've learned so far.