ISD Discussion Document

Changes from 1.01:

  • previous discussion cut out and changes in wording here and there
  • internal double quotes forbidden in fields as discussed
  • the LF line terminator is not stated as obligatory, instead I say "whatever the ftp site uses", but as long as nic.funet.fi stays the "official" home of the isd files, it's LF.

    Internet Stamp Databases - Format Description ver 1.02 Martti Tolvanen 1994-02-16

    This document (format.isd) describes the format used in the Internet Stamp Databases (ISD) and gives some guidelines for creating their content. Another document (ideology.isd - still not assembled from previous discussions) will give more information about what the ISD are, what they can be used for and why they took the form they currently have.

    Acknowledgements:

    A major part of the ideas and some of the writing up has come from (in alphabetical order) Dave Mills, Björn Munch and Michael Rys. Steve Anderson and Andy Goodall have provided valuable criticism.


    Contents

    1. Basic file structure and some basic policies

    2. Comment lines

    2.1. Formatted vs. free-format comments

    2.2. Header

    2.3. Place markers

    2.4. Abbreviation lines

    3. Data lines

    3.1. Overview of the fields

    3.2. Fields in detail

    3.2.1. Category

    3.2.2. ID

    3.2.3. Shade

    3.2.4. Paper

    3.2.5. Perforation

    3.2.6. Watermark

    3.2.7. Other

    3.2.8. Date

    3.2.9. Denomination

    3.2.10. Color

    3.2.11. Description

    3.2.12. Quantity printed

    3.2.13. Alt-ID

    3.2.14. Fields that are not included

    3.2.15. Things currently under discussion

    Appendices

    I. List of ISD curators

    II. Standard ISD abbreviations


    1. Basic file structure and some basic policies

    ISD stamp issue listings are intended to be the largest common factor between computerized stamp inventory/trading systems in any hardware and software environments. They will be in the public domain, and will contain only copyright-free information or information for which a permission for non-commercial use in the ISD project has been obtained.

    ISD listings will be provided one country per file (or one country plus dependencies etc.). Each ISD file has a named curator who coordinates (and probably also does) all the work on his/her country. See Appendix I for the presently active curators.

    ISD files are database-like text files that contain one stamp or variant described per one formatted line. Optional comment lines can be dispersed anywhere between the stamp data lines. Comment lines can contain free text or some parsable special information. Section 2 discusses comment lines in detail and section 3 deals with data lines (the actual stamp descriptions).

    The files must be limited to printable 7-bit characters for maximal portability. Individual curators may adopt various standards for coding "national" characters within a particular country (examples: n~ for Spanish "enye", or ISO 10 for writing Scandinavian characters as }{| ][\, or LaTeX codes). Any such standards will be documented within the ISD files.

    Empty lines are allowed anywhere in the file. No leading spaces are allowed on any line.

    The line terminators used in the ISD files will depend on the ftp site where they are stored. The first home for them will be nic.funet.fi (a.k.a. ftp.funet.fi), in the directory /pub/doc/mail/stamps and its subdirectories. That is a unix machine, so the lines will terminate in LF. If the files are transferred in ascii (text) mode, the line terminators should become automatically transformed to conform to the style of your own computer.

    Different data items (fields) are separated by commas in the list. The fields may optionally be surrounded by double quotes ("). No commas are allowed within the fields, unless in a field within double quotes. No double quotes are allowed within the fields. The fields should be presented without leading or trailing blanks, meaning that there will be no fixed lengths. The correct number and order of fields must be kept in each data record.

    Each data line should be contain a unique description of the stamp/variant with no references to other data lines. Knowing previously defined abbreviations may be required.

    ***writing going on here***

    so that single lines will remain understandable. The data lines should be complete that one can identify most stamps just by seeing the line (and understanding the abbreviations it contains) *and* by having *any* catalog for the country in question. Recognizing special variants, especially different dies, may require having a special catalog that lists those variants (sorry, you can't have everything in an ASCII listing).

    There is no universal sorting order in which stamps should be presented, each curator is free to arrange them as (s)he likes.

    For file size considerations, most properties of stamp variants are recommended to be presented in a coded or abbreviated form instead of the fully written values. Whenever possible, the abbreviations should have some resemblance to the full forms so that the listings will remain more usable even if no database tools are used. Example: iT for imperforate at top instead of "A".

    All abbreviations used must be stated explicitly in a parsable format within the listing (see section 2.4).

    Most non-distinctive data should be omitted, like perforation in the case of issues where no perforation differences exist. This is both for brevity and for avoiding copyright problems.

    2. Comment lines

    Comment lines are identified by % as the first non-blank character on the line (preferably as the very first character). They can be inserted anywhere in the file. In addition to free- format comments we've designed special kinds of functional comments where certain rules apply: place markers, certain lines first in the header, and definitions of abbreviations. Each type is dealt with in the following sections.

    2.1. Formatted vs. free-format comments

    :, where the comment token is a string with no spaces, typed right next to the percent sign. The colon may or may not be separated by a space. (See below for examples.) A line starting with % followed by anything that is not a reserved comment token is a free-format comment that can contain any printable 7-bit characters.

    Recommendations for comment lines:

  • max. 70 characters per line for easier printing and displaying
  • in free-format comments leave a space after the percent sign. It improves readability and ensures that there will be no conflicts with future reserved comment tokens
  • please don't write in all caps

    2.2. Header

    Each ISD file will have a header (a number of comment lines preceding the data) that identifies the curator, describes the content etc. Here's a sample header:

    %%: Internet Stamp Databases, reproduction for commercial use prohibited
    %Country: Estonia
    %Country-prefix: ee
    %Last-update: 1993-11-16
    %ISD-version: 1.02
    %Curator: Martti Tolvanen
    %e-mail: Martti.Tolvanen@helsinki.fi
    %Catalog-used: none
    %Archive: anonymous ftp at nic.funet.fi under /pub/doc/mail/stamps
    %Data-records:
    %Comment-lines:
    %Fields: Categ,ID,Shade,Paper,Perf,Wmk,Other,Date,Denom,Color,Descr,Qty,Alt-ID
    %Field-lengths-max:
    %Field-lengths-min:
    %Field-lengths-avrg:
    %Currency: up to June 21 1992 1 rouble = 100 kopeks
    %Currency: starting from June 22 1992 1 kroon (crown) = 100 senti (cents)
    % The change was made at 10 roubles to 1 kroon.
    % Now the kroon is supposed to be coupled to Deutschmark.
    %Copyright (C) Martti Tolvanen 1993
    %
    % This listing covers the post-independence issues of Estonia in
    % as much detail as I know.
    %
    % The numbering is one of my own, simply chronological.
    %
    % The quantities printed are taken from an exhibit by Eesti Post
    % in a stamp show in Helsinki on Nov 6th 1993.
    %
    % Booklets and mini-sheets are put in separate categories BK and MS
    % where numbers are not consecutive. Instead, the ID numbers are taken
    % from the first stamp of the BK/MS.
    %
    % I'll be happy to receive any comments, corrections or new information.
    %
    % Abbreviation format: %Abb:Field,Abbreviation,Meaning,Type
    % "Field" tells you in which part of the data the abbreviation is
    % applicable.
    % "Type" can be C/P/S (complete/prefix/suffix) for indicating
    % whether there may or may not be other characters on either
    % side of the abbreviation.
    % See file format.isd in nic.funet.fi for more info on the format
    % (it's not there yet, though.)
    %
    %Abb:Categ,BK,Booklet,C
    %Abb:Paper,lN,non-phos. paper,C
    %Abb:Paper,lP,phos. paper,C
    %Abb:Descr,CoA,coat-of-arms,
    %Abb:Other,di,die ,P
    ....

    Notes:

  • documentation of comment tokens (obligatory vs. optional) needs work
  • "country-prefix" is a short reference intended to be used in multi-country contexts. It should be the ISO country code whenever there is one
  • currency and copyright are intended to be free-format stuff even if they look formatted.

    2.3. Place markers

    Finding the original places for the comment lines is a problem if the user wants to re-sort the data. For a solution we have planned that a controlled sorting would involve 1) removing comments, 2) sorting the data, and 3) putting the comments back.

    The key for putting the comments back where they belong without manual editing is a "comment place marker", the characters %% followed by the ID number range where the comment applies and terminated by a colon. Examples of syntactically correct place markers:

    %% 1-4:
    %%1-4:
    %% 1:
    %% :

  • empty range in the last example would refer to all of the data.
  • dash and colon are obligatory terminators of the ID numbers.
  • ranges must be continuous.

    When referring to stamps outside the "main" category, like postage dues, the appropriate category identifier (see section 3.2.1) must preceed the ID numbers. Examples:

    %% PD: (applies to all postage dues)
    %% PD10-15:

    Question: should this appear as PD10-PD15?

    Use of the place markers:

    A place marker line may or may not contain free text after the colon. Any single-% lines after the place marker until the next %%-line (in the context of comments separated from data) are taken to constitute a comment block that moves together with the place marker line.

    When comments are restored into a re-sorted list or a partial list, the idea is to put each comment block in front of the first occurence of a data line that falls within the ID-range of the place marker.

    (Why do I take up the idea of comment blocks again just after telling I've abandoned it? It's because currently the abbreviation lines can't show ranges and yet they may be placed within the data.)

    2.4. Abbreviation lines

    (For the moment I leave the Type there copying my latest suggestion, with minor edits)

    Suggestion for an abbreviation syntax:

    %AbbXX:Field,Abbreviation,Meaning,Type

  • identifier %Abb tells it's a parsable abbreviation, and the optional two characters after that may be used by the curator, if (s)he wishes, to create alternative expansion sets to abbreviations (different languages, verbose/terse).
  • Field tells in which field the abbreviation applies.
  • Abbreviation is the string you see within the data.
  • Meaning is the expanded string. If it contains commas it must be closed in double quotes. A trailing space can be fixed with double quotes or by leaving the space before the final comma.
  • Type can be Complete, Prefix or Suffix (C/P/S). C would tell the parser that the string must occur as the only content of the field to have the stated meaning, whereas P/S would tell that there must be something else in one direction from the abbreviation. No type means no specific context defined. The idea of Type is to help write better parsers, but if it seems to make things more complicated, we can forget it.

    Examples:

    %Abb:Perf,L,Line perforation,P
    %Abb:Perf,I,Imperforate,C

    Then another rhetoric question: Shall we presume that the pretty printers and whatever programs we have for expanding the %Abb- lines will automatically add the field name into a verbose description so that

    %Abb:Wmk,LC,Large Crown,

    is OK, or do we have to include the field name in the definition like in the following?

    %Abb:Wmk,LC,Wmk Large Crown,

    I'm more in favour of the latter style, it would leave more choices to the curator, like enabling the alternatives:

    %Abbde:Wmk,LC,Wz. grosse Krone,
    %Abbes:Wmk,LC,Fil. corona grande,
    %Abbse:Wmk,LC,Vm. stor krona,

    One final comment: now that the ranges are not supported in the abbreviations, there is not much point for Michael to introduce the shades in pseudo-abbreviations. Or can some of you figure out a mechanism for taking the range from a previous %%-line?

    3. Data lines

    3.1. Overview of the fields

    The fields used in the Internet Stamp Databases are: Category, ID#, Shade, Paper, Perforation, Watermark, Other, Date, Denomination, Color, Description, Quantity printed, Alt-ID. Suggested "official" abbreviations: Categ,ID,Shade,Paper,Perf,Wmk,Other,Date,Denom,Color,Descr,Qty,Alt-ID

    The basic ISD identifier (index key) for each listed stamp variant is formed from the fields 1-7 (Category, ID#, Shade, Paper, Perforation, Watermark, Other) separated by dashes, with trailing dashes removed. Therefore each data record must have a unique combination of these fields. Exception: the dash between category and ID is left out, so that most IDs wouldn't be like -245

    As an aside, this index key is meant to be used in cross- reference files for various alternative catalog numbers (in the case of copyright-protected numbers the xrefs can be constructed for private use only and must not to be distributed over the Internet) or for xref to any data that may appear in the future, like picture archives of stamps.

    For compactness of the xref files it is highly recommended that the data presented within fields 1-7 be abbreviated as much as possible.

    The "Other" field can list several properties at once and can be expanded so that any strange properties that we haven't been able to think of right now can be incorporated there, so the field structure should be stable.

    Fields 8-11 (Date, Denomination, Color, Description) are the ones that the user needs for finding the stamp in her/his catalog, the "verbal cross-reference". These fields should be presented clearly enough so that all the "basic stamps" can be identified in any catalog, even though such clarity cannot be achieved in all of the special property fields.

    Field 13 (Alt-ID) is meant for providing a direct catalog cross- reference when the numbering used in the ISD comes from a real catalog.

    3.2. Fields in detail

    3.2.1. Category

    This field is a prefix that has to be used when a listing contains several sections with overlapping sequences of numbers. Suggestions for standard nomenclature of some commonly occuring categories (in "abbreviation syntax"):

    (No category indicated=normal mail)
    %Abb:Categ,AM,Air Mail,C
    %Abb:Categ,O,Official,C
    %Abb:Categ,PD,Postage Due,C
    %Abb:Categ,E,Express,C
    %Abb:Categ,AE,Air Mail Express,C
    %Abb:Categ,MS,Mini Sheet,C
    %Abb:Categ,BK,Booklet,C
    %Abb:Categ,MI,Military Post,C
    %Abb:Categ,TG,Telegraph,C

    (No final spaces in "Air Mail " etc. in this version.) What did I miss? (Not counting country-specific ones.)

    Each curator decides which categories, if any, are used in her/his listing and documents them in the header.

    3.2.2. ID

    Main ID#, corresponds more or less to a main catalog number. The choice of numbering is up to the curator. Preferably taken directly from an established catalog, if a permission can be obtained. The main IDs should be entirely numeric, so if the template catalog uses letter prefixes/suffixes in "main types", the letters should be converted into decimals (A549 -> 549.1).

    Any letters etc. that may be part of a catalog number of a variant should be kept out of the main ID. If the curator wants to reproduce the catalog numbering in complete detail, the variant descriptors should be put in the Alt-ID field (the last one).

    An ideal numbering would IMHO assign different numbers only for types that are different in major design, color and denomination. All other differences (perf, watermark, luminescence, shades, dies and plates etc) would be listed under the same ID number with the variant properties given in their appropriate fields (3-7). In this way each user would have an easy way to skip any type of variants that (s)he is not interested in. However, if a listing uses an existing catalog numbering, there may be different main IDs given to stamps with different watermark, perforation, or tagging.

    3.2.3. Shade

    This field is reserved for specifying color differences that are considered to be "variants" of the "same" color. The division is subjective, but still fairly obvious in the context of stamp issues. The "main" color of the stamp, if you have to specify that for describing the difference between two main-ID#s (like in long definitive series) shouldn't be here, but in the Color field. For reproducing catalog numbers and for brevity a curator may choose to insert letter codes here for referring to the colors. These must be explained in comments, of course.

    3.2.4. Paper

    This field will include both "classical" paper varieties (blued vs. white, horizontally vs. vertically laid, plain vs. with silk threads etc.) and luminescence ("tagging", "phosphor") differences. Suggested standards for often occuring luminescence properties: lW and lY for white and yellow phosphorescence, lP for whatever phosphorescence, lF for fluorescence, lN for no luminescence, or l2B, lRB, lLB, lCB for 2 luminescent bands and right, left, and center band. (These should be presented in abbreviation syntax, and maybe moved to an appendix together with all the suggested standard abbreviations.)

    3.2.5. Perforation

    Standard notation of teeth per 20 mm, horizontal gauge x vertical gauge if the sides are different. Partial perforations are to be indicated by giving the imperforate edges as iL, iR, iT, iB for single sided imperforates, or iH and iV for coil stamps. If one does not care for these variants, having them marked in standard notation makes them easy to filter out. Simply "I" here would indicate imperforate. Perforation type can be indicated as a prefix before the gauge or as a single letter if there are no gauge differences. Suggested abbreviations: L=line perf, B=bullseye perf (comb perf), R=rouletted, Z=zigzag etc. C=coil perf (referring to Dutch special perfs of 20's and 30's). Any commonly occuring and/or "long" perforations within a country could be best represented in codes (e.g. A and B instead of 14.25x14 and 14.25x14.75).

    3.2.6. Watermark

    Watermark type as a code, optionally followed by suffixes S, I, R to indicate wmk sideways, inverted or reversed. Any combination of the suffixes is allowed.

    3.2.7. Other

    A structured field for everything else that may be needed for making distinctions between variants. In order to make this field parsable, too, we suggest that every item in this field has a label and is separated by dashes from other items. The key identifiers ("names") of the properties are strings of two lower case letters (or one letter plus underscore), and the "values" of the properties are given in capital letters and numbers. The case convention is meant to improve the readability of this field. Because the length of the identifiers is fixed and the dash separators are there, the system is actually case-insensitive.

    Question: should we require that the properties are given in some standard order (like alphabetically) in the rare cases where several apply to one variant?

    We'd like to have all of the properties that are to be used in the "Other" field included in a universal list to avoid naming collisions. Here's what we have until now: (should do them in the abbreviation style)

    bp[X]back printing
    diXDie/Type
    erXError
    foXformat
    guXGum
    ha[X]halved (L/R)
    pc[X]Precancel (argument specifying the type of precancel if several are available)
    plXplate
    prXprinting method and/or house
    rp[X]Reprint (x could indicate year)

    X represents any string consisting of upper case letters and numbers.

    Any more ideas of other properties that might be needed for distinguishing any variants of any country?

    3.2.8. Date

    ISO format, yyyy-mm-dd. We'd recommend using official issue dates here, but in case of classical stamps the first known day of use might be better. If the day or month of some issue is unknown, or the curator does not bother to enter the dates, (s)he is allowed to either fill in the m and d values with ?, leave them blank or use 01-01 to signify an unknown date. The year should always be there. If some variant is so poorly known that even the year of issue is uncertain (or if the curator has no way/no time for looking up all the years), we recommend using the year of the original issue for reissues of the same type. Whatever policy the curator chooses should be documented in the headers.

    3.2.9. Denomination

    We recommend more or less copying what appears on the actual stamp (so that 100 small units and 1.00 large unit are distinct even if they mean the same thing). Currency units may be abbreviated (2 kr instead of 2 kroner), or omitted altogether when there is no danger of misunderstanding, and fractions can be given as decimals (.5 instead 1/2). Currency units not shown in the stamp may be optionally indicated in parenthesis, as in "2 (kr)".

    If there are variants that differ in the currency unit presentation, like certain Spanish general Franco stamps (PTAS vs. PTS), we feel the distinction is too easy to miss (by the human eye) in the denomination field and it should rather appear in the actual stamp description (see 3.2.10 below). If such differences are distinguishing for variants within one main ID number, they would be best listed as die differences.

    Denominations of overprinted issues are recommended to be indicated with a slash: 0.60/0.15 (new/old). If there is a danger of confusion between a fraction and an overprint (like 1/2), currency indicators will help (1(kr)/2(kr)). A curator may choose some other style for showing overprints, as long as it is documented and constistently used so that the overprint collector can find what he needs.

    Surcharges for charity etc. are given in the style 2.10+0.40 if the charity value shows on the stamp, and in the style 1.50(+8.50) if it doesn't. Mini-sheet stamps where a premium is paid over the face value may be problematic with regard to the division of the premium to individual stamps.

    3.2.10. Color

    Mandatory in cases where the same combination of design/denomination exists in several colors, but a good idea to include in all values of long definitive sets. This field should be used only for "main" colors that define "different" stamps. Shades that are considered to be variants of the major color (and that probably come under the same ID#) should be indicated in the Shade field (3). Names of the colors will of course be subjective based on different catalogs and different perceptions of the curators. Suggested short names for some common color terms: blk, grey, brn, red, orng, ylw, grn, blue, vlt; lt = light, dk = dark. (Again, I should work these into %Abb-lines.)

    For bicoloured issues the colours should be given outer/inner. The notation Multicolour/blue+ can be used to indicate the predominant colour.

    3.2.11. Description

    A free-text verbal description of the design or the idea of the stamp. The description is recommended to be in English if it's not a direct quote from text appearing in the stamp, but basically we're at the mercy of the curators.

    It's a good idea to give some short indication of set relations here so that one sees which stamps belong together without checking the dates. A bad example from consecutive Finnish stamps of 1934-5:

    Kivi Calonius Porthan Chydenius

    ...becomes much better when the Red Cross set is clearly marked

    Kivi centenary RC: Calonius RC: Porthan RC: Chydenius

    3.2.12. Quantity printed

    A useful piece of information for figuring out the relative scarcity of a stamp. We think it should have a reserved place even if all curators wouldn't want to enter it (and probably the information can't be found for all issues). This field is not restricted to be all-numeric, so that one can indicate which variants are pooled into some total quantity.

    3.2.13. Alt-ID

    This is a field in which the curator may put all the extra characters that identify a variant in her/his source catalog but which must be kept out from the ID field for allowing filtering of any variants. This is reduntant information, but very useful for those who share the curator's favourite catalog. It is also useful because it provides a compact unique ID for writing down in your album/stockbook if you choose to use the numbering system of the listing. The Alt-ID can comprise either just the characters to be appended to the main ID number or it can be the complete string, but the two notations must not be mixed within a single country.

    3.2.14. Fields that are not included

    Things that have been considered as fields but have been left out:

  • everything that is more related to one's own collecting activities, like condition, value, # possessed etc. (everyone will have different needs/tastes)
  • alternative numbering schemes (copyright problems)
  • name of the country (appears in the header)
  • set identifiers and definitive/commemorative identifiers (these things are somewhat debatable, and can be deduced in most cases from the stamp descriptions)

    3.2.15 Things currently under discussion:


    Appendices

    Appendix I

    List of ISD curators 1994-01-28

    Question mark after a catalog name indicates that a final permission for the use of numbering is not sure yet.

    CountryNumberingLanguage(s)CuratorStatus
    Andorra (Spanish)Edifil (?)EnglishMartti Tolvanennot started
    Estoniaself-madeEnglishMartti Tolvanencomplete 1991-93
    FinlandNormaFinnish and EnglishMartti Tolvanenmain types complete
    FranceYvertEnglishDave Mills1849-1991 nearly complete
    Greeceself-madeEnglishSteve Anderson50 stamps ready
    IrelandMcDonnell (?)EnglishSteve Andersonnot started
    NetherlandsNVPH (?)EnglishEd Voermansjust starting Assisted by Twan Laan (specialized perforation and paper variants)
    NorwayNKEnglishBjörn Munchcomplete 1855-1969 & 1989-94
    Peru??EnglishSteve Andersonnot started
    Schleswigself-madeEnglishBjörn Munchcomplete
    SpainEdifil (?)EnglishMartti Tolvanennot started
    SwitzerlandZumstein (?)German and EnglishMichael Rysready (just waits for Zumstein's permission)
    UKStanley GibbonsEnglishAndy GoodallComplete to 1992


    If you are interested in the ISD project, Mail one of us :-

  • Steven G. Anderson
  • Andy Goodall
  • Twan Laan
  • David Mills
  • Björn Munch
  • Michael Rys
  • Martti Tolvanen
  • Ed Voermans

    II. Standard ISD abbreviations

    Back to Main Philately Page