Everything you asked about taxonomy development with Heather Hedden
A mutual blog post with Heather Hedden, taxonomist and the author of "The Accidental Taxonomist"
The product information we make available to prospective and existing customers includes characteristics about the things our companies make and sell. Characteristics often include things like size, color, and ingredients, which are captured in product attribute lists.
The characteristics of our target audiences can also be captured as attributes. While attributes help make it possible to distinguish one product from related products that exist in the same category, and assist us in helping consumers discover information they need to make decisions, they can also help us deliver right-fit content to those searching for it.
Attributes designed to improve discovery specific audience segments for a family of health-related products might include characteristics like age group (adults, teenagers, toddlers, infants, newborns), gender, and healthcare conditions (diabetes, allergies, high blood pressure). Attributes for a clothing line might include obvious details about products like size and fabric, but might also contain details about the stretchability, durability, and silhouette or the product, as well as the country of origin.
Heather Hedden, an experienced taxonomist/ontologist and author of the book, "The Accidental Taxonomist" (Information Today Inc., 2010, 2016), joind The Content Wrangler on Zoomin’s sponsored show, for a chat about taxonomy development. They discussed the need for — and role of — attributes, facets, and hierarchical taxonomies — and the content capabilities they make possible when implemented thoughtfully.
Heather talked about the differences — and relationships — between attributes and facets and how they related to hierarchical taxonomies. She explained how attributes relate to metadata, and how they are used to help serve up the right content, to the right people, in the right language, and format, at the right time.
As the fascinating discussion with Scott Abel, The Content Wrangler, ebbed and flowed, the audience shared many interesting and impactful questions, which could not be answered on the live show. Heather kindly agreed to answer the audience's questions and to share her expertise and worldview on these topics.
To watch the full show: Understanding Attributes And Taxonomies
Hidden labels are essentially a sub-type of alternative labels. If your taxonomy is configured to display alternative labels to end-users, you may designate certain labels as hidden labels so that they will not display, because they are not appropriate for display. However, this all depends on your implementation.
Either approach could be taken, and it depends on the specific software products as to which works best: keeping data in two systems and integrating then with an API, or migrating data from one system to be with data in another, namely from a terminology management system to a SKOS-based taxonomy/thesaurus management system. SKOS supports all kinds of knowledge organization systems, including terminologies (multiple languages, alternative labels, multiple definitions, examples, notations, etc.). Unless there is some important, additional feature in the terminology management system, which the SKOS taxonomy/thesaurus management system lacks or you have a very large terminology already managed in a terminology system (that is much larger than your taxonomy will be), I think it probably makes more sense to migrate the terms and their definitions and other data from the terminology management system to the SKOS system, thus replacing the terminology management system. There are other advantages to using a SKOS system as it opens up other linked data options and the ability to extend to a knowledge graph.
Yes, that’s the classic top-down vs. bottom-up approach question about taxonomy design, which also applies to an ontology. After defining your use case and taxonomy scope, be practical and start with what you have: existing taxonomies, glossaries, term lists, metadata schema, spreadsheet tables of term types or other metadata. An analysis of this data, especially column headers and worksheet tabs will suggest candidates for taxonomy facets or ontology classes (which can be the same, but they function differently in a front-end application for a different query experience). It’s much more common that an organization already has components of a taxonomy or taxonomy but no ontology, and thus an ontology is added to an existing taxonomy and other named entity controlled vocabularies. A further analysis of business objects or entities that have a business use case and perhaps some brainstorming with stakeholders can suggest additional taxonomy facets or ontology classes. Once you have the ontology classes defined with example entities or concepts belonging to a class, you can then define the attributes and the relationships. Further detailed building out of the taxonomy with more specific concepts may then follow.
Additionally, whether the semantic knowledge model is to be developed from the point of view of concrete expressions (instances), or from the point of view of abstract classes, depends above all also on what one (the knowledge modeler) already knows about the domain.
For a taxonomy, which is one or more hierarchies of concepts, a mind-mapping tool is not needed. Most people start out creating hierarchical taxonomies in a spreadsheet program, with deeper levels of hierarchy in succeeding columns to the right. And for initial brainstorming, instead of mind-mapping, it is recommended to start with card sorting to lay out an initial structure of the knowledge domain. When designing an ontology, however, some people find that a mind mapping tool is useful, especially for naming the semantic relationships between classes.
To clarify, user search terms/strings is not the same as a folksonomy, unless you capture, store, and make the terms available for reuse. A displayable reusable search terms is what would be called a folksonomy, although such implementations are rare.
Enabling search based on user search strings, in general, is standard and expected. User search strings must match concept labels, whether preferred, alternative, or hidden. Preferred labels are what display in a hierarchical display. Preferred and alternative labels are what display in an alphabetical, type-ahead, or search-suggest term display. Hidden labels are designated to never display to users, but they may still match user search strings. The user is then redirected to the search result set without seeing that their search string matched a hidden label of a concept tagged to the content.
In taxonomy creation and editing, it is useful to analyze a search log report to consider adding some of those search strings as alternative or hidden labels for concepts.
In addition to search logs, there are several other sources to continuously expand an existing taxonomy, including text.
Yes, good observation. If there are many, numerous items of the same specific subcategory, then the method of further refinement, refinement by attributes, needs to be quite specific and extensive.
SKOS does not have a special feature for attributes and thus does not have an explicit way to differentiate attributes and hierarchical concepts. But SKOS is flexible in its use. Since a controlled vocabulary of attributes lives outside of the hierarchy (although connected through relationships), in SKOS, a separate concept scheme should be used for maintaining attributes, separate from the concept scheme(s) for the hierarchical concepts.
Do:
Don’t:
It is indeed becoming less common to build new taxonomies from scratch. But taxonomies get out of date, or were built for only a limited use (one department), and need to be revised, merged, expended, etc. New companies, startups, and new lines of business need to build new taxonomies. Many companies are unique and need to build their own taxonomies, but it is true that for describing products and services of certain businesses, the same set of metadata can be shared. There are taxonomies available for sale/for license.
It depends on the nature of the content. For example, research articles are suited for a thesaurus, a public website is suited for a hierarchical taxonomy, an intranet is suited for faceted taxonomy, e-commerce for products is suited for a hierarchical taxonomy plus attributes.
This is called tagging, indexing, categorizing, and sometimes annotation. It can be manual or automated, or a combination of automated suggestions with human review/approval with supplemental tagging. Fully automated can be done in a taxonomy management system that combines auto-tagging, whereas manual or automated with manual review is done in an application, such as content management system or SharePoint.
The taxonomist considers how many item results there are with each category and combined filters: a few to select from (3 or more), but not too many to not fit in one page display. Others might contribute to the decision, such as a product manager or user experience designer.
A thesaurus is a type of controlled vocabulary with specific structure and features (broader, narrower, and related term relationships, alternative labels or used-from terms, scope notes). But sometimes people use “controlled vocabulary” to refer to a term list that does not have a hierarchy.
If there are multiple products of a certain type, then a new subcategory can be created. Otherwise and existing category can be renamed slightly to slightly broaden its meaning in order to include the new product type.
The World Wide Web Consortium is the organization that has developed standards and guidelines for the World Wide Web and the Semantic Web. The abbreviation for World Wide Web Consortium is W3C. The standard for taxonomies from the W3C is SKOS (Simple Knowledge Organization System).
The word “tags” is not strictly defined. So, yes, you can call attributes tags, since they are tagged, as metadata, to content. And, yes, attributes can be tagged over different categories. Actually, “categories” is not strictly defined, but they are understood to be something in a hierarchy: either any taxonomy concept in a hierarchy, or just the top levels of a hierarchy, but just not named entity instances.
If it’s for a public ecommerce implementation, then, of course, you can look at competitor ecommerce websites. I have done that before when I consulted on an ecommerce taxonomy.
Yes, sometimes a subcategory can appear under more than one broader category, if it is a valid hierarchical relationship (according to the standards) in both cases. This way, users browsing down different hierarchical paths still come to the desired subcategory. This is called polyhierarchy, because the subcategory (concept) belongs to more than one hierarchy. It appears to the end user in more than one place, but it is the same subcategory tagged to the same content in either location.