|
More guidance is provided in this section of the publishing standards because of the importance of metadata in HealthInsite.
Introduction
Metadata is indexing information about a resource which can be used by search facilities to make searching more precise. Metadata is crucial to HealthInsite for search and navigation functionality. A metadata record is required for each resource selected.
Many web resources are subject to change. Details such as title, creator and description may be affected. Hence it is important that contributors maintain the metadata records within their own sites and update the metadata when they update the content of any resource. The HealthInsite harvester regularly checks sites so that the HealthInsite database can be updated. More recently updated resources will have a higher ranking in HealthInsite search results.
The HealthInsite metadata specification is compliant with the AGLS metadata element set (Australian Standard AS 5004.1-2002 and AS5044.2-2002). Further details about AGLS are available on the National Archives of Australia website at http://www.naa.gov.au/records-management/create-capture-describe/describe/AGLS/index.aspx. The full HealthInsite specification is at http://www.healthinsite.gov.au/metadata.cfm. Where partner sites are under a mandate to provide full AGLS records, the HealthInsite specification is recommended. Other partners may also wish to provide full records; the work required to produce these is much appreciated.
In 2003, HealthInsite began an alternative approach whereby contributors maintain just 8 key elements of the metadata set. The extra elements are added by HealthInsite staff and are maintained only in the HealthInsite database. This alternative now forms the minimum requirement for partner metadata, as described here. If you have older resources with full HealthInsite metadata records, it is not necessary to replace these with short records (though you may wish to do so at some stage).
Some contributors create additional metadata elements, either for their own purposes or for contributions to other portals. The HealthInsite harvester ignores metadata which does not match the HealthInsite syntax.
Metadata syntax
The HealthInsite Editorial Team will be happy to advise on the options for metadata display and export/import. Most HealthInsite partners have the metadata record embedded into the HTML code for the resource. With this method, any user can see the metadata by viewing the source code of the HTML page. For PDF documents, the metadata record is generally embedded into the HTML cover page that links to the resource.
The metadata must be displayed for the HealthInsite harvester with the following HTML syntax or with XHTML syntax. It is important to use the META syntax with correct punctuation. Upper/lower case is less important. DC stands for Dublin Core, the international standard on which AGLS is based.
<META NAME="DC.Creator" CONTENT="">
<META NAME="DC.Publisher" CONTENT="">
<META NAME="DC.Title" CONTENT="">
<META NAME="DC.Description" CONTENT="">
<META NAME="DC.Language" SCHEME="RFC3066" CONTENT="">
<META NAME="DC.Date.Modified" SCHEME="ISO8601" CONTENT="">
<META NAME="DC.Format" SCHEME="IMT" CONTENT="">
<META NAME="DC.Identifier" SCHEME="URI" CONTENT="">
Another method is to store the metadata record in a separate HTML file. With this method, HealthInsite needs to know the metadata record URL as well as the corresponding resource URL.
HealthInsite can import metadata from simple Excel spreadsheets where the metadata element names are used as colum headings. There are some restrictions on use of special characters such as "&". XML import has been under trial but there are no immediate plans for XML/RDF import.
Metadata creation
These are some of the ways that our current contributors create metadata records:
- Filling in the content details into a raw HTML syntax template - then embedding this block of text into the resource source code.
- Using a metadata tool which has a form interface and some menus to select from. The tool creates the correct syntax.
- Creating metadata as individual database fields within a content management system. Again the system creates the correct syntax for output.
- Copying and pasting records which HealthInsite staff have created using the HealthInsite metadata tool.
It is important that the metadata match the resource and that the metadata be derived from the resource. For PDF documents, the resource, HTML cover page and metadata must all match. Be particularly careful with titles and dates.
If you are using a metadata template with default values, always check that the defaults are appropriate for the resource you are currently indexing. If you are copying, pasting and editing metadata from a previous resource (because your new resource is similar), make sure that you do the editing promptly and completely. Proofread to make sure the metadata matches the new resource.
Who creates the metadata?
Initially, HealthInsite staff will provide assistance with metadata creation. In the longer term it is expected that partners will undertake this work for their own sites but HealthInsite will continue to help if needed.
If site management is outsourced, then partners will need to negotiate with their contractors. Some of the issues to clarify are: Which pages need metadata? Who will create the metadata and how? Does the contractor have indexing expertise? Who will check the metadata? Who will keep the metadata up-to-date and how?
Updating
If you are modifying a resource, always check the metadata and update if appropriate. If there has been any change in the real content of the resource, then, as a minimum, the metadata Date.Modified will need to be updated. However, you should also check the other fields. For example: Is there a new title? Has the resource changed so much that it needs a new description? Have you added new language versions? Has a previous PDF document now been made available as HTML?
Organisational changes
Government departments and their subdivisions are especially prone to name changes, but it can happen with other organisations as well. Generally resources should be presented with the correct authentication details as at the time of original publication and the metadata should reflect this. However, for resources which are being updated, the authentication details at the time of update are appropriate.
Metadata on higher level pages, including the home page
Sometimes it is better to route people to the top page of a set of resources rather than directly to individual items. For example, if you have a set of documentation relating to a grants scheme, it may be better to put metadata on the entry level page and not on the individual forms, advertisements, etc.
If you are putting metadata on a higher level page, then generally it should relate to the set of pages encompassed by the higher level page. For example, metadata on a site's home page should relate to the site as a whole. You also need to keep track of any changes to the set of pages encompassed by the higher level page and update the metadata accordingly. For example, in an active site, the metadata Date.Modified on the home page needs to be updated regularly.
The metadata elements
Content must be provided for each element as described below.
The name of the person or organisation primarily responsible for the content of the resource.
- For textual documents, the author is the creator.
- For personal creators, the format "Lastname, Firstname" is recommended.
- In many organisations, personal authorship is not recorded; the organisation is regarded as the author, not the person. Use the name of the organisation.
- If an organisation name has a well known acronym, add the acronym in brackets at the end of the name.
- Do not enter the name of a person or contractor who has merely converted a resource into an Internet version (for example by marking up a document with HTML coding).
- However, if the resource's content has been commissioned or produced under contract, then it may be appropriate to enter the name of the contractor (personal or company).
- You can enter more than one name for joint creators. Extra names should be added in separate lines with the same META syntax.
The name of the entity responsible for making the resource available.
- - Generally this will be the name of the site owner.
- For PDF documents, use the publisher name on the resource.
- For older resources, use the name valid for the date on which the resource was published.
- If the publisher name has a well known acronym, add the acronym in brackets at the end of the name.
- You can enter more than one name for joint publications. Extra names should be added in separate lines with the same META syntax.
Title
The name given to the resource.
- Use the title as it appears on the resource itself. Generally this would also match the title in the <title> area of HTML.
- Titles should preferably be in lower case except for the first letter of the first word and proper names.
Description
A textual description of the content and/or purpose of the resource.
- Generally the description should be one to two sentences - just enough to help a user decide whether to follow the link to the resource. Use information from the abstract or summary of the resource if available. In writing a description it is important to step out of your organisational frame of reference and think from the user's point of view.
Language
The language of the content of the resource
- Scheme: RFC3066 - tags for the identification of language. A short list of codes is available from the Usage guide in the AGLS Metadata Element Set page - it is Appendix H: http://www.naa.gov.au/images/agls_usage_guide_v1-3_tcm2-881.pdf
- A more comprehensive list of codes is at http://www.loc.gov/standards/iso639-2/php/English_list.php. If there is no 2-letter code for a particular language, then select a 3-letter code.
- The code for English is "en".
- More than one code can be entered. For example, the metadata might be on a cover page that links to versions of a document in different languages. Use a semi-colon space delimiter between codes.
- Scheme RFC1766 will also be accepted by HealthInsite. Use the short list of codes described above.
Date.Modified
The date when the content of the resource was last updated.
- Scheme: ISO8601 (use formats YYYY or YYYY-MM or YYYY-MM-DD).
- Date.Modified must reflect the currency of the resource content. It should not be the date when the resource was converted to HTML or other format. Nor should it be updated for trivial changes to the presentation of the resource.
- If a resource has not been modified since it was first published, then Date.Modified should be the date of publication.
- Enter the date as fully as it is displayed on the resource - for example, if the resource says September 2000, use 2000-09 not 2000-09-19 and not 2000 alone.
Format
The data format of the resource.
Identifier
The URL for the resource.
- Scheme: URI
- For a PDF document, use the cover page URL.
Mock up example
<META NAME="DC.Creator" CONTENT="Smith, Fred">
<META NAME="DC.Creator" CONTENT="Jones, Betty">
<META NAME="DC.Publisher" CONTENT="Australian Society of Nobodies">
<META NAME="DC.Publisher" CONTENT="Australian Society of Important People (ASIP)">
<META NAME="DC.Title" CONTENT="Are you important?">
<META NAME="DC.Description" CONTENT="Everyone is important. 10 tips on how to improve your self esteem.">
<META NAME="DC.Language" SCHEME="RFC3066" CONTENT="en;es;it">
<META NAME="DC.Date.Modified" SCHEME="ISO8601" CONTENT="2003-07-15">
<META NAME="DC.Format" SCHEME="IMT" CONTENT="text/html">
<META NAME="DC.Identifier" SCHEME="URI" CONTENT="http//www.importantpeople.com.au/important.htm">
Publishing standards for HealthInsite, v5, June 2007
Updated September 2009
Printer friendly page
|