V5 development progress - Image indexing and thumbnails

  • This post is short update on one aspect the development process of V5 of Zoom. It covers the up coming image index features and image thumbnail support.

    Zoom Search Engine v5.0 introduces a new feature that allows users to search for images such as photographs and diagrams. Searching is carried out by using metadata associated with the file. Image files like JPEGs, PNGs and TIFFs are capable of storing textual data to provide more information about the image as well as technical metadata in the image file that details the photo-taking conditions such as camera make/model, if the flash was on, the shutter speed and aperture value, etc. The ImageInfo plugin extracts this metadata and allows Zoom to index this metadata according to its configuration.

    Digital cameras save images as specified by the EXIF (Exchangeable Image File) image file format. The specification uses existing file format such as JPEG (Joint Photographic Experts Group) or TIFF (Tagged Image File Format) with the addition of specific metadata tags.

    Further on, a multi-media news exchange format called the Information Exchange Model (IIM) was established to provide additional information, such as caption, news category or dateline. Metadata elements of IIM are quite commonly known as "IPTC headers" of digital image files. ImageInfo extracts this metadata based on the EXIF and IIM standards. While the image files supported by ImageInfo are JPEGs, PNGs*, TIFFs and GIFs, different levels of meta information will be available depending on the file type and the way the file was created.

    In addition to indexing meta data Zoom will index the any ALT text associated with an image on a HTML page and any text in the link that points to the image.

    It will also be possible to only index images larger than a certain minimum size. (to avoid indexing all the the small images, like buttons, found on a typical web site).

    In V5 of Zoom a new item of meta information will be supported, ZOOMIMAGE. This will allow you to associate an image with a
    particular page so that it will appear alongside the link in the search
    results. To do this, you will need to insert a meta tag on your pages like
    so:

    You can specify the appearance of the images in your search results by
    modifying the CSS in your search template file.

    As an alternative to specifying the thumbnail image by metadata you will be able to create a directory that contains all you thumbnail images. The thumbnail and the full image are associated via their file names.

    An example of how this looks is below,
    http://www.wrensoft.com/zoom/support/images/image_info_layout_faq/different_thumb.png

    Finally if you don't have an image prepared for each of your documents you can instead select to display a fixed icon for all documents of a particular type. e.g. The MS Word icon for all DOC files.

    http://www.wrensoft.com/zoom/support/images/image_info_layout_faq/same_icon.png

    I would also like to remind everyone that we offer free upgrades for 6 months after a purchase, so if you purchase V4 now, it will be a free upgrade to V5 when it becomes available.

    ------
    David


  • We don't support assigning an image per directory. Only per file or per file type.

    But if you are a web server expert, you could do something tricky. Like do some some URL rewriting to map all HTTP requests for image files in a particular directory to a single image file. We don't have a script to do this but it should be possible.

    It's easier to just create a individual thumbnails :) . Here is a very quick command that could probably use refinement...but it works.

    find productguides awk -F"/" ' NR>1 { gsub("\.(pdfxlsdoctxthtmhtml)$",".jpg",$NF); print "cp /home/images/pgthumb.jpg "/web/images/" $NF """ }' sh

    This command will find any file in the /productguides folder and create a thumbnail from a single JPG file you specify, for each of those files.

    That way if you have a special icon you want to use for a category of documents no matter what the extension is, you can do it this way.


  • Some additional preliminary documentation explaining the usage of the new image handling features can be found here.
    How to index images? (http://www.wrensoft.com/zoom/support/faq_plugins_image.html)
    How to customizing image search results layout (http://www.wrensoft.com/zoom/support/faq_plugins_image_layout.html)


  • MHT is not really an image file format. And we haven't had any requests for its support, at least not from anyone that actually uses it. We'll probably look at Open Office some time soon. But as most of these 'open' file formats are XML based, they should already work with the current version of Zoom.

    I am a bit confused about your request for image search as this was the exact topic of my initial post?

    We plan on providing searching for other binary files types (.ZIP, .EXE, .MOV, etc..) by their file names, but not on their content in V5.

    ------
    David


  • You can download a beta version of the image plugin here:
    http://www.wrensoft.com/ftp/imageinfo_beta.zip


  • That is great... Do could you support "MHT" format if you don't already. Also support for open office files if you don't already. Also "Zip,RAR & any other type of document that allows for you to just to be able to summarize the search...

    Other things such as image search, video search and other searches would be great..


  • In V5 of Zoom a new item of meta information will be supported, ZOOMIMAGE. This will allow you to associate an image with a
    particular page so that it will appear alongside the link in the search
    results. To do this, you will need to insert a meta tag on your pages like
    so:

    You can specify the appearance of the images in your search results by
    modifying the CSS in your search template file.

    Hello,

    I actually index a books website and on each page, the only image which is indexed is the image associated with the page ie a screenshot with the book, is their any way of getting this image next to the search result (of the page text) without actually adding extra HTML?

    I ask this because only one image picked up by zoom on each page.


  • Hello,

    Is it possible to have the image alt text as the actual clickable text as it is much better so have 'title of book' as opposed to 'short title.jpg'.

    Or, would it be possible to include it in the RSS output?

    Thanks


  • There are several ways to add an image or icon next to each search result.

    1) Add ZOOMIMAGE meta data to each page which tells Zoom which image to use with the page.

    2) Display the same icon for every page of a particular type. e.g. a PDF icon

    3) Link pages and image using the page file name. For example you can create a series of image files that have the same name as the page. So if your file was dog.html the image file could be dog.jpg.

    In each instance Zoom needs to be told which image should be used. It never attempts to guess if an image might be appropriate for the page in question. These options can be selected from the "Scan options" tab in the Zoom configuration window.


  • As of V5 beta 13, You can now specify thumbnails for ALL file extensions supported. This means you can even create thumbnails for your PDF documents, PPT slideshows or HTML web pages (using third party thumbnail generating applications), and have Zoom display them alongside your search results.

    To enable this in Zoom, double click on the extension in the "Scan Options" tab of the Configuration window, and clicking "Configure Images". Here you can select "Display different thumbnails for each file" and specify the thumbnail options similar to before (including changing the file extension for the thumbnails as required).


  • IMG ALT text is currently indexed in V5 so you can search for it and find the image.

    However, there is currently no option to use it as the title for the image link. We currently use either the meta title stored inside the image file (if available and configured to do so), or the filename itself. We may consider adding an option to use the ALT text for title if there is enough demand.


  • We don't support assigning an image per directory. Only per file or per file type.

    But if you are a web server expert, you could do something tricky. Like do some some URL rewriting to map all HTTP requests for image files in a particular directory to a single image file. We don't have a script to do this but it should be possible.


  • One of our users asked us if it would be possible to make the search results appear with only thumbnails and nothing else, in a grid-like fashion. We made an grid layout example to illustrate this and thought we'd post it up here for people to see what is possible with some configuring and CSS modifications.

    Note that this is just an example, and is one of many possible layouts that can be created with Zoom and CSS.

    Below is an actual screenshot of Zoom setup to show thumbnails only:

    http://www.wrensoft.com/forum/wrensoftimages/zoom_images_sample.gif

    To achieve this, you will need to turn off all the other elements in the search results (via the Configuration window, under the "Results Layout" tab) so that only the Image is displayed (along with the text link which we can not hide from here - we will do so via CSS).

    And then in your search_template.html file, where you can customize the CSS for your search results, you should have the following changes:

    .result_title { font-size: 100%; display: none; }

    This will hide the search result links so that only the images are shown.

    .result_block { margin-top: 15px; margin-bottom: 15px; display: inline; }
    .result_altblock { margin-top: 15px; margin-bottom: 15px; display: inline; }

    This will allow the search results to appear next to each other, as opposed to being on separate lines.

    You might also want to push the "Result pages" part to the next line with such:

    .result_pages { clear: left; }

    Further changes could of course be made to get it closer to your ideal appearance. We will update the documentation in the final release with more information on the new CSS classes available in V5.


  • Any chance possible to assign a thumbnail based on the category?

    I have many documents that have the PDF, DOC, TXT, etc... in different categories. I think it is a great feature now to be able to show a thumbnail to identify the file type.

    I suppose I could do it by having an thumbnail for each file in the category but with over 150,000 files that's a lot of thumbnails to have to generate.

    For example every file in the /docs/productguide/ folder could have a particular thumbnail and every file in the /docs/releasenotes/ could have a thumbnail, etc...

    My categories follow my directory structure and would be nice to have a global thumbnail for each directory instead of file.

    Thanks







  • #If you have any other info about this subject , Please add it free.#
    Your name:
    E-mail:
    Telphone:

    Your comments:


    If you have any other info about V5 development progress - Image indexing and thumbnails , Please add it free.