I S K O
1.1: History of knowledge organization
2. Research traditions, approaches and basic theoretical issues in KO
A: Approaches developed inside of KO:
2.1 Practicalist and intuitivist approaches
2.2 Consensus based approaches2.3 Facet-analytic approaches
2.4 User-based and cognitive approaches
2.5 Domain-analytic/epistemological approaches
B: Approaches developed outside of KO:
3. KO on different technological platforms:
3.2 KO in classical bibliographic databases
This article presents the field of knowledge organization (KO) and its core perspectives: knowledge organization processes (KOPs) and knowledge organization systems (KOS). In provides a brief overview of research traditions, approaches and basic theoretical issues in the field (practicalist and intuitivist approaches, consensus-based approaches, facet-analytic approaches, user-based and cognitive approaches, domain- analytic/epistemological approaches, bibliometric approaches, and IR approaches, among others). The article also briefly presents KO on different technological platforms (physical libraries, archives, museums, classical bibliographical databases and the Internet). The article argues that KO as a part of library and information science can be considered a narrow sense, but that the broader sense of KO is needed to provide the necessary knowledge for the narrow sense.
Knowledge Organization (KO) is a field of research, teaching and practice, which is mostly affiliated with library and information science (LIS). KO is first and foremost institutionalized in professorships at universities around the world, in teaching and research programs at research institutions and schools of higher education, in scholarly journals (for example,Knowledge Organization, 1993-), in national and international conferences, and in national and international organizations (for example, the International Society for Knowledge Organization, ISKO, cf.,Dahlberg 2010).
KO is about describing, representing, filing and organizing documents and document representations as well as subjects and concepts both by humans and by computer programs (cf.,Hjrland 2008). For these purposes, rules and standards are developed, including classification systems, lists of subject headings, thesauriand other forms of metadata. The organization of knowledge into classification systems and concept systems are core subjects in KO. The two main aspects of KO are (1) knowledge organization processes (KOP) and (2) knowledge organization systems (KOS).Knowledge organization processes(KOP) are, for example, the processes of cataloging, subject analysis, indexing, taggingand classificationby humans or computers.Knowledge organization systems(KOS) are the selection of concepts with an indication of selected semantic relations. Examples are classification systems, lists of subject headings, thesauri, ontologies and other systems of metadata.
Among the different aspects of the history of KO are the following:
(1) History of library classification systems
Covers library classifications from ancient times until today. The main work about this topic isSamurin 1964. The history of library classification is planned to be published in the following parts: I. Introduction and premodern classificationcovering the period from ancient times until the rise of modern library classifications and classification theory in the last quarter of the 19th century; II. Modern library classification; III: Postmodern library classification and an article OPAC and library discovery system. In the 20th Century the most important theoretical development was the development of the facet-analytical theory according to the dominant understanding in the field. It is based on Aristotelean logic and discussed byHjrland 2013b). The Dewey Decimal Classification has become the dominant system internationally but is often criticized for lack of theory and in important ways suboptimal compared to other systems. Library classifications today also face competition from systems developed in other contexts (cf.Martnez-vila 2016).
(2) History of the classifications of the sciences
This aspect is part of the field because library classifications are often based on classification of the sciences (even if there are also phenomena classificationswhich are not). The main work about this topic isKedrow 1975, and Peircebelongs to the contributors. A major work of an interdisciplinary nature is Machlup (1982) which was a part of an ambitious project about knowledge and information production. A related problem is the organization of encyclopedias. Today bibliometric mapping is a dominating method in the study of relations between research fields, but also the development of research classifications for administrative purposes like the Frascati Manual (OECD 2015) and domain-analyticstudies (e.g.,Wallerstein et al. 1996) are important.
(3) History of scientific taxonomies (classificationinthe sciences)
This includes the systematic botanists such as Konrad Gesner and Carl Linn, as well as developers of the periodical system of chemical elements such as Mendeleev and Meyer.
(4) History of the theory of classification and concept theory
There is no known main work covering this field. Important contributors include Aristoteles, Darwin, Wittgenstein, Rosch, Kuhn and many others (seeHjrland 2017, section 4). Definition and combination of concepts were also studied in projects of ideal languagesby such authors as Llull, Bisterfeld, Dalgarno, Wilkins and Leibniz. Recent debates concerning numerical, evolutionaryand cladistic approaches are also related.
(5) History of knowledge organizing systems and processes
There is no known main work on this topic, butKeyser 2012is among the many texts. The history of indexing and alphabetizationbelongs here.
(6) History of knowledge organization as a discipline
The development of KO as a disciplinefor research and teaching is mainly tied to the development of library and information science as university discipline (or professional school discipline), that is after 1850. People like Charles A. Cutter, W. C. Berwick Sayers and Ernest Cushing Richardson established the field of knowledge organization as an academic field around 1900. Henry Blisss book (1929)The organization of knowledge and the system of the sciencesalso represents one of the main intellectual contributions establishing the field. These authors argued that book classification should be based on knowledge organization as it appears in science and in scholarship learning. Two important events in the development of KO as an organized field of study, both led by Ingetraut Dahlberg, also were the creation of the journalInternational Classification(1974) (from 1993 renamedKnowledge Organization) and the establishing of the International Society for Knowledge Organization (ISKO) in 1989, of which the journal became an official organ. To describe the history of the field has difficulties. For example, the thesaurus today is clearly a part of the discipline, but it was originally external to the discipline. Also, what to consider important contributions depends on the metaperspective from which the field is considered. An article about the history of the discipline, emphasizing the names used for its subject field is Hider (2018).
Traditionally, approaches to KO are divided into human based approaches versus machine-based approaches (cf.Anderson and Prez-Carballo 2001a, b). There are, however, many different kinds of human approaches and many different kinds of computer-based approaches, and they are not necessarily always distinct. For example, human based approaches may be very mechanical, if humans just follow simple rules that they have learned, such as an alphabetical arrangement, or finding the best matches for book titles in a given KOS. Both humans and machines may or may not base their classification on citations, but if they both do, they are applying a similar approach. Hjrland (2011b), therefore, argued that this traditional distinction is theoretically unfruitful. Alternatively, it has been suggested that human indexers as well as programmers are guided by their knowledge/theories, which at the deepest level is connected to their (often-implicit) theories of knowledge. However, it is often difficult to reveal what kind of theoretical assumptions guide the KOPs. Such processes are often done intuitively and some systems have been difficult to relate to a theory. However, the following eight traditions, in and outside of KO, are probably the most influential and the most important today.
These are approaches, which make a priority of practical matters, such as using the same classification system for several libraries, and thereby, facilitating centralization of classification and indexing. From this perspective, KO should be balanced between, on the one hand, adequate and updated subject knowledge and, on the other hand, the need for stability, in order to avoid a reclassification. The model here is the Dewey Decimal Classification system (DDC, first edition constructed by Melvil Dewey in 1876), which today is the dominant library system worldwide. (Practicalism, as described in this section, should not be confused with pragmatism, which has a deep intellectual foundation and commitment and one that is important in the domain-analytic approach, as described in2.5below).
Another example is the journal classification in the citation databases: The Institute of Scientific Information (ISI) itself provides a classification of journals at the level of the database that has been based on intuitive criteria (Pudovkin & Garfield 2002; here cited fromLeydesdorff 2006, 602). In other words, no kind of research based criteria was used, just the intuition of the classifiers.
Henry E. Bliss (1929, 1933) found that library classification should be based on what he referred to as the scientific and educational consensus. Topics should be collocated and placed in classes not according to the whim of the person who devises the classification system, but according to the standards set by scientists and educators (Drobnicki 1996, 3). It was characteristic that (a) Bliss consulted the scholarly literature and (b) he believed that one is able to detect an underlying pattern of agreement. Eugene Garfield has described Henry Bliss as a true scholar. His goals and aspirations were different from those of Melvil Dewey, whom he certainly surpassed in intellectual ability, but by whom he was dwarfed in organizational ability and drive (Garfield 1974, 291). Blisss view of consensus probably reflected the positivism or modernism of his time. He wrote (1933, 37):
The more definite the concepts, the relations, and the principles of science, philosophy, and education become, the clearer and more stable the order of the sciences and studies in relation to learning and to life; and so the scientific and educational consensus becomes more dominant and more permanent.
Kruk (1999, 137) is among thecritics of this view and wrote: In the twentieth century knowledge is not perceived as a solid structure any more. The universal library is a utopian vision and it belongs to the same category as the universal encyclopedia and the universal language. Today, Blisss view is contrasted by a view of knowledge that is much more concerned with conflicting interests and perspectives (cf. the domain analytic view,2.5). His engagement with literature that has to be classified is, however, still an important principle.
Blisss reception may reveal something about the hostility that serious academic work may encounter in a librarianship dominated by practicalism:Bliss had announced his intention to develop a new general classification in theLibrary Quarterlyin 1910. The announcement met with bitter hostility, not from Melvil Dewey (Bliss always said that his personal relations with Dewey were cordial [...]) but from some of Deweys disciples. Bliss gradually became a rather solitary figure in the American library scene, and his later work met with apathy (Campbell 1976, 139) Further: Blisss first book,The Organization of knowledge and the system of the sciences, was published in 1929 by Henry Bolt & Co., New York, after he had failed to interest the American Library Association in it. Only three of Blisss papers were ever published by the Association, and two of those were condensed [...]. The American Library Association, after negotiations lasting several years, refused to publish his second book without a generous subsidy from the author sufficient to cover all publishing costs (Campbell 1976, 139)
Fortunately, this hostility did not hinder Blisss recognition: The two books [...] and the outline version of his scheme,A System of Bibliographic Classification(1935, 2nd ed., 1936) won him a reputation in many parts of the world as an original thinker of great power, and a classificationist who was not afraid to tread out new paths (Campbell 1976, 139).
The facet-analyticparadigm is probably the most distinct approach to knowledge organization that has been developed within LIS. It is mainly attributed to S. R. Ranganathan and the British Classification Research Group, but it is fundamentally based on the principles of logical divisiondeveloped more than two millennia ago (Mills 2004). Faceted systems differ from enumerative systems by not listing all of their classes, but provide building blocks from which specific classes for each document may be formed. This approach still has a strong position in the field and it is the most explicit and pure theoretical approach to knowledge organization (KO). The strength of this approach is its logical principles and the way it provides structures in knowledge organization systems (KOS). The main weaknesses of this approach are (1)a lack of an empirical basis in its methodology (although, of course, any given facetted classification must have a basis in some empirically derived list of concepts) and (2) a speculative ordering of knowledge without a basis in the development or an influence of theories and socio-historical studies. It seems to be based on the problematical assumption that relations between concepts are a priori and are therefore not established by the development of models, theories and laws (see further inHjrland 2013b).
A distinction should be made between user-friendlyKOS and user-basedKOS. Today, it seems to be evident that KOS should be user-friendly, but this was not always the case (seeHjrland 2013candJensen 1973). It is not evident, however, that user-friendly systems should be produced on information collected from users or about users. Extremely successful systems such as Apples iPhone, Dialogs search system and Googles PageRank, for example, are not based on the empirical studies of users. Actually, the idea that KOS should be based on user studies (rather than, for example, on literary warrant, logical division, word statistics or scholarly theories) seems to be an unsupported hypothesis. Nonetheless, it is a family of approaches that has its supporters (for further information seeHjrland 2013c).
A core principle of the domain-analytic approach is: The starting point for understanding classification is one that any object, any document and any domain could be classified from multiple equal correct perspectives. (Mai 2011, 723). In other words:
Different communities may be interested in the same object (e.g. a stone in the field [or a given book]) but may interpret it differently (e.g. from an archeological or geological point of view). What is informative (and thus information) depends on the point of view of the specific community. (Hjrland 2002, 116)
In contrast to consensus based approaches(2.2above), domain analysis assumes the existence of multiple perspectives. Disagreement is common and the picture is really not one of agreement, but of conflicting schools, and the closer the neighbours the sharper the conflict (Broadfield 1946, 69). Of course, the degree of consensus is stronger in some domains when compared to others. Recently, a revolution has taken place in ornithology and it seems as if the new classification of birds has a very strong scientific basis and a high degree of consensus (seeFjelds 2013). To examine the warrant for a classification is, of course, part of the domain-analytic framework. It is also important to realize that not every perspective or classification is as important as any other. One should not subscribe to relativism due to convenience, i.e., abstain from considering the strengths and the weaknesses of different perspectives or paradigms.
Ingetraut Dahlberg has expressed the view that KO is part of the metasciences:
I consider Knowledge Organization as a subdiscipline of Science of Science with application fields not only in the Information Sciences but also for all subject fields (domains) needing Taxonomies (classification systems of objects) and other fields like Statistics, Commodities, Utilities, Weapons, Patents, Museology etc.
According to Science Theory, every domain has its own area of objects and of methods and processes, next to other relationships. (Dahlbergcited fromDodebei 2014).
Hjrland (2011b) also claims the importance of the theory of knowledge for indexing and for information retrieval. Today, medical doctors often rely on systematic reviews that are based on the paradigm termed evidence based medicine (EBM, or interdisciplinary: evidence based practice, EBP). By implication, indexing and retrieval have to adapt to the criteria for what counts as knowledge in this paradigm. The same is, of course, the case in other fields and in the case of conflicting paradigms. In general, criteria for organizing knowledge are to be found in the subject fields, their theories and their paradigms. It is therefore important with Dahlberg to consider KO as a science of science.
From the domain-analytic perspective, the termKObetter reflects the connection to the metasciences than does the terminformation organization, IO.KOpoints to the related fields of history, philosophy and the sociology of knowledge (among other fields). This is one argument consideringKOthe preferred term (see further inHjrland 2012b).
A model of a domain-analytic study is rom (2003) who identified different paradigms in the art studiesand compared them with major library classification systems.
Bibliometrics (with altmetrics, informetrics, scientometrics and webometrics) is an interdisciplinary field with strong affiliations to LIS. This field developed techniques for producing bibliometric maps based on co-citation analysis, bibliographic coupling, or by direct citation. Such maps may serve information retrieval and are a form of competing or a supplementary approach to knowledge organization, although the fields of KO and bibliometrics have so far not had much mutual contact. Among the main bibliometric researchers are names such as Eugene Garfield, Henry Small and Howard D. White. Bibliometric methods are sometimes considered as being objective, but Hjrland (2013aand2016b) argues that this is not the case and he considers the strong and weak sides of this approach to KO.
Information retrieval (IR) is, today, a term mainly related to computer science. Formerly, it had strong relations to information science, but the field has largely immigrated to computer science. Among the basic assumptions and techniques when using this approach is the study of statistical relations between terms, documents and collections of documents. Among the main IR researchers are names such as Gerald Salton, Karen Sprck Jones, Stephen Robertson and C. J. Keith van Rijsbergen. Again, if the purpose of a KOS is to help users to identify relevant documents, then IR is a family of competing approaches when compared to the approaches studied by the KO community. As such, it is a very successful family of approaches. Robertson (2008) stated, statistical approaches won, simply. They were overwhelmingly more successful [compared to other approaches such as thesauri]. This issue is further addressed in, for example, Hjrland (2016a).
Many other approaches exist. Here just two will be mentioned. Heinrich Herre (2013) discussed an ontological approach that provided formal specifications and harmonized the definitions of concepts used to represent the knowledge of specific domains. It made use of the onto-axiomatic method, of graduated conceptualizations, of levels of reality, and of top-level-supported methods for ontology-development.
Jack Andersen (2015) is a main representative of a genre approach to knowledge organization. He wrote:[A]s Bazerman (2012) reminds us, while recognizing the social importance of effective search engines and other systems of structuring knowledge and inscribing writing, we still need to understand the activity contexts of those producing and using knowledge and information because no matter how fragmentary, how automatic, and how fast information comes to a user, the very user (herself/himself placed in an activity contexts [sic!]) must ultimately make sense of the information found and that sense cannot be made without understanding the various of activity (and the practices) producing that information (Andersen 2015, 14-15).
We have now presented an overview of the approaches to KO and of the competing approaches from outside KO. It is obvious that these, as well as other approaches, need careful considerations, and that important strategic decisions are involved in this choice of theory. The future of the field of KO is dependent on whether the research, the teaching and the practice of the future, provide helpful systems and services for given user groups, or whether existing systems like Google already provide satisfying results. A core issue is, therefore, to evaluate the relative strengths and the weaknesses of different approaches. As already stated, Hjrland (2015a) argued that for serious purposes, such as for medical decisions, classical databases are still needed and that KO needs to be further developed to make searches more efficient.
Ideally, KO should be understood as being a knowledge base that can be applied to all technological platforms. However, its development has often been technology-driven. Therefore, an overview of KO on different platforms is provided in this section.
KO in libraries is mainly represented by classification systems and indexing systems such as the Dewey Decimal Classification (DDC) and the Library of Congress Subject Headings (LCSH).
Library classification systems may be developed for the double function of shelving physical documents and as a tool for information retrieval (IR) including the browsing in printed catalogs (from the 1980s in OPACs, online public access catalogs). The function as a shelving tool puts major restrictions on design of classifications because such systems must arrange all documents in a linear sequence. This double function of classification systems may be an economic and a management advance within some contexts, but it implies that the function of classifications as an IR tool is based on restrictions that are unnecessary from the retrieval perspective.
While many (big) libraries have developed tailored classifications, some systems have been used by many libraries and they may be considered to be kinds of standards. Among the best known library classification systems are the DDC (first published in 1876, 23nd edition published in 2011), theLibrary of Congress Classification(LCC), 1901- (regularly updated), and theUniversal Decimal Classification(UDC), first published in 1905-1907 (latest full edition 2005).
From a research perspective, we may ask what kind of a theory underlies such a KOS? It could be said that the DDC emphasizes practicalities, efficient management, and standards rather than a scholarly, theoretical approach. It is the worlds most widely used library classification system, but is not optimal to any particular collection or target group and it does not according to, among others, James Blake (2011, 469-470) reflect current scientific knowledge. Although Blake found that such outdated classifications may still do their job well (2011, 470), this seems to reflect a lack of ambition in providing up-to date information, and to prioritize library management issues, rather than advanced IR requirements. DDC is probably the system which has meant most for the institutionalization and ideology of LIS and KO.
LCC was developed, based on the collections of the Library of Congress, thus reflecting this specific collection. The major principles of this system are its basis on literary warrant and the enumeration of classes (as opposed to facetted systems). Vanda Broughton (2004, 143) wrote, It is quite hard to discern any strong theoretical principles underlying LCC. Some formulations by S. R. Ranganathan (e.g.1951) have also suggested that such traditional systems seem to lack a theoretical foundation (in his eyes, as opposed to his own approach). The LCC and UDC reflected in the past, much better current scholarly knowledge when compared to the DDC (but the UDC scheme, in particular, has not generally been updated, cf.Hjrland 2007a). When it has been said that such systems lack a theoretical foundation, it can be argued that their implicit principles are
that they should reflect current subject knowledge.That their theoretical basis should be found in the epistemological assumptions on which they reflect the subject fields covered;
that they should be based on the principle of literary warrant, first formulated by Hulme (1911), which means that they are based on the literature that they classify. The LCC, in particular, is based on classifying the books in the Library of Congress, but because of the size of the collections, it has turned out to be fruitful for many other large research libraries).
Faceted library classification systems were developedin the first half of the 20th century, as opposed to enumerative systems. The LCC is the model of an enumerative system, in which all of the classes are listed (and the system is, therefore, comprehensive; LCC fills up about 41 volumes). Faceted systems, on the other hand, do not list all of their classes, but provide building blocks from which specific classes for each document may be formed (Ranganathan was inspired by theMeccano toy). While the UDC may be considered to be a forerunner partly based on facet analytic principles, the most well-known systems in this tradition are the Colon Classification (CC)developed by S. R. Ranganathanin 1933 and the Bliss Bibliographic Classification, 2nd ed. (BBC2), developed by Jack Mills, Vanda Broughton and others from 1977 (still in progress). While these systems represent a progress in research and development, their practical influences have been disappointing although their principles have gradually influenced other systems, including the DDC.
BBC2, the CC, the DDC, the LCC and the UDC areuniversal systems, covering all fields of knowledge, although some (e.g., BBC2, the LCC and the UDC) may be considered sets ofdomain-specific systems, each of which as a whole makes up a universal system. Universal systems are less important for special libraries and for scholarly subject retrieval when compared with special systems that have been designed for subject bibliographies such as MEDLINE or PsycINFO. When online bibliographic databases developed from about 1963 (cf.,Hahn 1998), the development of domain-specific thesauri for online searching became a research front in KO. However, some researchers, for example, Szostak, Gnoli & Lpez-Huertas (2016), argue that universal systems are important for interdisciplinary research. Although research is still done on library classification and indexing systems, this area has lost importance when compared with research on other kinds of KOS that are better adapted to and used by online retrieval systems.
The most used basis for organization in universal systems has been the division (or collocation) by scholarlydisciplines. The DDC, for example, states that
[A] work on water may be classed with many disciplines, such as metaphysics, religion, economics, commerce, physics, chemistry, geology, oceanography, meteorology, and history. No other feature of the DDC is more basic than this, that it scatters subjects by discipline (Dewey 1979, p. xxxi).
The alternative principle, collocation byphenomena, has also sometimes been preferred and has been used and has its supporters (see, for example,Ahlers Mller 1981; Beghtol 2004; Brown 1914; Szostak, Gnoli & Lpez-Huertas 2016).
During the 1980s, library catalogs became available as OPACs. This allowed users to search the catalog from remote terminals, e.g., from the users homes. OPACs also provided better search possibilities, but to a wideextent, they continued to use the same kinds of KOS as were developed in the age of the card catalog.
Archival science is an independent field with its own journals, conferences, textbooks and encyclopedias (e.g.,Fox and Wilkerson 1998; Duranti and Franks 2015). Knowledge organization of archives should, however, also be considered to be a part of KO, as was defined at the beginning of this article. Archives may contain official records, business records, images, letters, diplomas, etc. The most importantspecificprinciple of organization for this domain is the principle of provenance.
Provenance is a fundamental principle of archival science, referring to the individuals, groups, or or