segunda-feira, Fevereiro 09, 2004

INFORMATION AS THING by Michael Buckland, University of California.

Numerous definitions have been proposed for "information". One important use of "information" is to denote knowledge imparted; another is the denote the process of informing. Some leading theorists have dismissed the attributive use of "information" to refer to things that are informative. However, "information-as-thing" deserves careful examination, partly because it is the only form of information with which information systems can deal directly. People are informed not only by intentional communications, but by a wide variety of objects and events. Being "informative" is situational and it would be rash to state of any thing that it might not be informative, hence information, in some conceivable situation. Varieties of "information-as-thing" vary in their physical characteristics and so are not equally suited for storage and retrieval. There is, however, considerable scope for using representations instead.

WHEN IS INFORMATION NOT INFORMATION?

Even if we dismiss the argument that untrue information is not information, we could still ask what could not be information? Since being evidence, being information, is a quality attributed to things, we may well ask what limits there might be to what could or could not be information. The question has to be rephrased as "What things could not be regarded as informative?" We have already noted that a great variety of things can be regarded as informative so the range is clearly very large.

We might say that objects of which nobody is aware cannot be information, while hastening to add that they might well become so when someone does become aware of them. It is not uncommon to infer that some sort of evidence, of which we are not aware, ought to or might exist and, if found, would be of particular importance as evidence, as when detectives search, more or less systematically, for clues.

Determining what might be informative is a difficult task. Trees, for example, provide wood, as lumber for building and as firewood for heating. One does not normally think of trees as information, but trees are informative in at least two ways. Obviously, as representative trees they are informative about trees. Less obviously, differences in the thickness of tree rings are caused by, and so are evidence of, variations in the weather. Patterns reflecting a specific cycle of years constitute valuable information for archaeologists seeking to date old beams (e.g. Ottaway, 1983). But if lumber and firewood can be information, one hesitates to state categorically of any object that it could not, in any circumstances, be information or evidence. We conclude that we are unable to say confidently of anything that it could not be information.

This leads us to an unhelpful conclusion: If anything is, or might be, informative, then everything is, or might well be, information. In which case calling something "information" does little or nothing to define it. If everything is information, then being information is nothing special.

Being information is situational

Information-as-process is situational. Therefore, evidence involved in information-as-process is so situationally also. Hence, whether any particular object, document, data, or event is going to be informative depends on the circumstances, just as the "relevance" of a document or a fact is situational depending on the inquiry and on the expertise of the inquirer (Wilson, 1973). It follows from this that the capability of "being informative", the essential characteristic of information-as-thing, must also be situational. We may say of some object or document that in such-and-such a combination of circumstances, in such-and-such a situation, it would be informative, it would be information, i.e. information-as-thing.

But, as noted above, we could in principle say that of any object or document: One just has to be imaginative enough in surmising the situation in which it could be informative. And if one can describe anything this way, we are making little progress in distinguishing what information-as-thing is. Further, it is a matter of individual judgement, of opinion:

(1) whether some particular thing would be pertinent; and, if so,

(2) whether the probability of it being used as evidence would be significant; and, if so,

(3) whether its use as evidence would be important. (The issue might be trivial or, even if important, this particular evidence might be redundant, unreliable, or otherwise problematic.) And, if so,

(4) whether the importance of the issue, the importance of the evidence, and the probability of its being used -- in combination -- warrant the preservation of this particular evidence.

If all of these are viewed positively, then one would regard the thing -- event, object, text, or document -- as likely to be useful information and, presumably, take steps to preserve it or, at least, a representation of it.

Information by Consensus

We have shown that (i) the virtue of being information-as-thing is situational and that (ii) determining that any thing is likely to be useful information depends on a compounding of subjective judgements. Progress beyond an anarchy of individual opinions concerning what is or is not reasonably treated as information depends on agreement, or on at least some consensus. We can use an historical example to illustrate this point. It used to be considered important to know whether a woman was a witch or not. One source of evidence was trial by water. The unfortunate woman would be put in a pond. If she floated she was a witch. If she sank she was not. This event, the outcome of the experiment, was, by consensus, the information-as-thing needed for the identification of a witch. Nowadays it would be denied, by consensus, that the exact same event constituted the information that it had previously been accepted, by consensus, as being.

Where there is a consensus of judgement, the consensus is sometimes so strong that the status of objects, especially documents, being information is unquestioned, e.g. telephone directories, airline timetables, and textbooks. In these cases arguments are only over niceties such as accuracy, currency, completeness, and cost. As a practical matter some consensus is needed to agree on what to collect and store in retrieval-based information systems, in archives, data bases, libraries, museums, and office files. But because these decisions are based on a compounding of different judgements, as noted above, it is not surprising that there should be disagreement. Nevertheless, it is on this basis that data are collected and fed into databases, librarians select books, museums collect objects, and publishers issue books. It is a very reasonable prediction that copies of the San Francisco telephone directory will be informative, though there is no guarantee that each and every copy will necessarily be used.

"Information-as-thing", then, is meaningful in two senses: i) At quite specific situations and points in time an object or event may actually be informative, i.e. constitute evidence that is used in a way that affects someone's beliefs; and (ii) Since the use of evidence is predictable, albeit imperfectly, the term "information" is commonly and reasonably used to denote some population of objects to which some significant probability of being usefully informative in the future has been attributed. It is in this sense that collection development is concerned with collections of information.


Information as evidence

One learns from the examination of various sorts of things. In order to learn, texts are read, numbers are tallied, objects and images are inspected, touched, or otherwise perceived. In a significant sense information is used as evidence in learning - as the basis for understanding. One's knowledge and opinions are affected by what one sees, reads, hears, and experiences. Textbooks and encyclopedias provide material for an introduction; literary texts and commentaries provide sources for the study of language and literature; arrays of statistical data provide input for calculations and inference; statutes and law reports indicate the law; photographs show what people, places, and events looked like; citations and sources are verified; and so on. In each case it is reasonable to view information-as-thing as evidence, though without implying that what was read, viewed, listened to, or otherwise perceived or observed was necessarily accurate, useful, or even pertinent to the user's purposes. Nor need it be assumed that the user did (or should) believe or agree with what was perceived. "Evidence" is an appropriate term because it denotes something related to understanding, something which, if found and correctly understood, could change one's knowledge, one's beliefs, concerning some matter.

Further, the term "evidence" implies passiveness. Evidence, like information-as-thing, does not do anything actively. Human beings do things with it or to it. They examine it, describe it, and categorize it. They understand, misunderstand, interpret, summarize, or rebut it. They may even try to fake it, alter it, hide it, or destroy it. The essence of evidence is precisely that perception of it can lead to changes in what people believe that they know.

Dictionary definitions of "evidence" include: "An appearance from which inferences can be drawn; an indication, mark, sign, token, trace. ... Ground for belief; testimony or facts tending to prove or disprove any conclusion. ... Information, whether in the form of personal testimony, the language of documents, or the production of material objects, that is given in a legal investigation." (Oxford English Dictionary, 1989, vol. 4, p. 469). If something cannot be viewed as having the characteristics of evidence, then it is difficult to see how it could be regarded as information. If it has value as information concerning something, then it would appear to have value as evidence of something. "Evidence" appears to be close enough to the meaning of information-as-thing to warrant considering its use as a synonym when, for example, describing museum objects as "authentic historic pieces of evidence from nature and society." (Schreiner, 1985, p. 27).

One area in which the term "evidence" is much used is in law. Much of the concern is with what evidence -- what information -- can properly be considered in a legal process. It is not sufficient that information may be pertinent. It must also have been discovered and made available in socially approved ways. However, if we set aside the issues of the propriety of the gathering and presentation of evidence and ask what, in law, evidence actually is, we find that it corresponds closely to the way we are using it here. In English law, evidence can include the performing of experiments and the viewing of places and is defined as: "...First, the means, apart from argument and inference, whereby the court is informed as to the issues of fact as ascertained by the pleadings; secondly the subject matter of such means." (Buzzard et al., 1976, p. 6; also Wigmore, 1983).



TYPES OF INFORMATION

Pursuing the notion of information as evidence, as things from which one becomes informed, we can examine more specifically what sorts of things this might include.

Data

"Data", as the plural form of the Latin word "datum", means "things that have been given." It is, therefore, an apt term for the sort of information-as-thing that has been processed in some way for use. Commonly "data" denotes whatever records are stored in a computer. (See Machlup (1983, p. 646-649) for a discussion of the use and mis-use of the term "data".)

Text and documents

Archives, libraries, and offices are dominated by texts: papers, letters, forms, books, periodicals, manuscripts, and written records of various kinds, on paper, on microform, and in electronic form. The term "document" is normally used to denote texts or, more exactly, text-bearing objects. There seems no reason not to extend the use of "text" and "document" to include images, and even sounds intended to convey some sort of communication, aesthetic, inspirational, instrumental, whatever. In this sense, a table of numbers can be considered as text, as a document, or as data. Text that is to be analyzed statistically could also be regarded as data. There is a tendency to use "data" to denote numerical information and to use text to denote natural language in any medium.

Further confusion results from attempting to distinguish two types of retrieval by making and compounding two unwarranted assumptions about "data" and "document": (i) that "data retrieval" should denote the retrieval of records that one wishes to inspect and "document retrieval" should denote references to records that one may wish to inspect; and (ii) that "data retrieval" would be a "known item" search, but that "document retrieval" would be a "subject search" for an unknown item (van Rijsbergen, 1979, p. 2; Blair, 1984). The former assumption imposes an odd definition on both terms. The second is illogical and contrary to practical experience (Buckland, 1988b, pp 85-87). It is wise not to assume any firm distinction between data, document, and text.

Objects

The literature on information science has concentrated narrowly on data and documents as information resources. But this is contrary to common sense. Other objects are also potentially informative. How much would we know about dinosaurs if no dinosaur fossils had been found? (Cf. Orna and Pettit (1980, p. 9), writing about museums: "In the first stage, the objects themselves are the only repository of information.") Why do centers of research assemble many sorts of collections of objects if they do not expect students and researchers to learn something from them? Any established university, for example, is likely to have a collection of rocks, a herbarium of preserved plants, a museum of human artifacts, a variety of bones, fossils, and skeletons, and much else besides. The answer is, of course, that objects that are not documents in the normal sense of being texts can nevertheless be information resources, information-as-thing. Objects are collected, stored, retrieved, and examined as information, as a basis for becoming informed. One would have to question the completeness of any view of information, information science, or information systems that did not extend to objects as well as documents and data. In this we, like Wersig (1979), go further than Machlup (1983, p. 645) who, like Belkin & Robertson (1976), limited information to what is intentionally told: "Information takes at least two persons: one who tells (by speaking, writing, imprinting, signally) and one who listens, reads, watches." Similarly Heilprin (1974, p. 124) stated that "information science is the science of propagation of meaningful human messages." Fox (1983) took an even narrower view, examining information and misinformation exclusively in terms of propositional sentences. Brookes (1974), however, was less restrictive: "I see no reason why what is learned by direct observation of the physical environment should not be regarded as information just as that which learned by observing the marks on a document." Wersig (1979) adopted an even broader view of information as being derived from three sources: (i) "Generated internally" by mental effort; (ii) "Acquired by sheer perception" of phenomena; and (iii) "Acquired by communication." We view "information-as-thing" as corresponding to Wersig's phenomena (ii) and communications (iii).

Some informative objects, such as people and historic buildings, simply do not lend themselves to being collected, stored, and retrieved. But physical relocation into a collection is not always necessary for continued access. Reference to objects in their existing locations creates, in effect, a "virtual collection." One might also create some description or representation of them: a film, a photograph, some measurements, a directory, or a written description. What one then collects is a document describing or representing the person, building, or other object.

What is a document?

We started by using a simple classification of information resources: data, document, and object. But difficulties arise if we try to be rigorous. What, for example, is a document? A printed book is a document. A page of hand-writing is a document. A diagram is a document. A map is a document. If a map is a document, why should not a three-dimensional contour map also be a document. Why should not a globe also be considered a document since it is, after all, a physical description of something. Early models of locomotives were made for informational not recreational purposes (Minns, 1973, p.5). If a globe, a model of the earth, is a document, why should one not also consider a model of a locomotive or of a ship to be a document? The model is an informative representation of the original. The original locomotive or ship, or even a life-size replica, would be even more informative than the model. "The few manuscript remains concerning the three ships that brought the first settlers to Virginia have none of the power to represent that experience that the reconstructed ships have." (Washburn, 1964). But by now we are rather a long way from customary notions of what a document is.

The proper meaning of "document" has been of concern to information scientists in the "documentation" movement, seeking to improve information resource management since the beginning of this century. The documentalist's approach was to use "document" as a generic term to denote any physical information resource rather than to limit it to text-bearing objects in specific physical media such as paper, papyrus, vellum, or microform. Otlet and others in the documentation movement affirmed:

(1) That documentation (i.e. information storage and retrieval) should be concerned with any or all potentially informative objects;

(2) that not all potentially informative objects were documents in the traditional sense of texts on paper; and

(3) that other informative objects, such as people, products, events and museum objects generally, should not be excluded. (Laisiepen, 1980). Even here, however, except for Wersig's contribution (Wersig, 1980), the emphasis is, in practice, on forms of communication: data, texts, pictures, inscriptions.

Otlet (1934, p. 217), a founder of the documentation movement, stressed the need for the definition of "document" and documentation (i.e. information storage and retrieval) to include natural objects, artefacts, objects bearing traces of human activities, objects such as models designed to represent ideas, and works of art, as well as texts. The term "document" (or "documentary unit") was used as a specialized sense as a generic term to denote informative things. Pollard (1944) observed that "From a scientific or technological point of view the [museum] object itself is of greater value than a written description of it and from the bibliographical point of view it should be regarded therefore as a document." A French documentalist defined "document" as "any concrete or symbolic indication, preserved or recorded, for reconstructing or for proving a phenomenon, whether physical or mental." ("Tout indice concret ou symbolique, conservé ou enregistré, aux fins de représenter ou de prouver un phénomène ou physique ou intellectual." (Briet, 1951, p.7)). On this view objects are not ordinarily documents but become so if they are processed for informational purposes. A wild antelope would not be a document, but a captured specimen of a newly discovered species that was being studied, described, and exhibited in a zoo would not only have become a document, but "the catalogued antelope is a primary document and other documents are secondary and derived. ("L'antilope cataloguée est un document initial et les autres documents sont seconds ou dérivés." (Briet, 1951, p. 8). Perhaps only a dedicated documentalist would view an antelope as a document. But regarding anything informative as a "document" is consistent with the origins and early usage of the word, which derived from the Latin verb docere, to teach or to inform, with the suffix "-ment" to denoting means. Hence "document" originally denoted a means of teaching or informing, whether a lesson, an experience, or a text. Limitation of "document" to text-bearing objects is a later development (Oxford English Dictionary, 1989, vol. 4, p. 916; Sagredo & Izquierdo, 1983, pp. 173-178). Even among documentalists, however, including anything other than text-bearing objects in information retrieval appears to occur only in theoretical discussions and not always then (Rogalles von Bieberstein, 1975, p. 12). Meanwhile the semantic problem remains: What generic term for informative things is wide enough to include, say, museum objects and other scholarly evidence, as well as text-bearing objects? Objecting to the use of "information" or of "document" for this purpose does not remove the need for a term.

Most documents in the conventional usage of the word -- letters, books, journals, etc. -- are composed of text. One would include diagrams, maps, pictures, and sound recordings in an extended sense of the term "text". Perhaps a better term for texts in the general sense of artifacts intended to represent some meaning would be "discourse". We could also characterize these texts as "representations" of something or other. However, we could hardly regard an antelope or a ship as being "discourse". Nor are they representations is any ordinary sense. Their value as information or evidence derives from what they signify about themselves individually or, perhaps, about the class or classes of which they are members. In this sense they represent something and, if not a representation, they could be viewed as representative. If an object is not representative of something, then it is not clear how far it can signify anything, i.e. be informative.

One might divide objects into artifacts intended to constitute discourse (such as books), artifacts that were not so intended (such as ships), and objects that are not artifacts at all (such as antelopes). None of this prevents any of these from being evidence, from being informative concerning something or other. Nor does it prevent people from making uses different from that which may have been intended. A book may be treated as a doorstop. Illuminated initial letters on medieval manuscripts were intended to be decorative, but have become a major source of information concerning medieval dress and implements.

"Natural sign" is the long-established technical term in philosophy and semiotics for things that are informative but without communicative intent (Clarke, 1987; Eco, 1976).

Events

We also learn from events, but events lend themselves even less than objects do to being collected and stored in information systems for future edification. How different the study of history would be if they could! Events are (or can be) informative phenomena and so should be included in any complete approach to information science. In practice we find the evidence of events is used in three different ways:

1. Objects, which can be collected or represented, may exist as evidence associated with events: bloodstains on the carpet, perhaps, or a footprint in the sand;

2. There may well be representations of the event itself: photos, newspaper reports, memoirs. Such documents can be stored and retrieved; and, also,

3. Events can, to some extent, be created or re-created. In experimental sciences, it is regarded as being of great importance that an experiment -- an event -- be designed and described in such a way that it can be replicated subsequently by others. Since an event cannot be stored and since accounts of the results are no more than hearsay evidence, the feasibility of re-enacting the experiment so that the validity of the evidence, of the information, can be verified is highly desirable.

Regarding events as informative and noting that, although events themselves cannot be retrieved, there is some scope for recreating them, adds another element to the full range of information resource management. If the recreated event is a source of evidence, of information, then it is not unreasonable to regard the laboratory (or other) equipment used to re-enact the event as being somehow analogous to the objects and documents that are usually regarded as information sources. In what senses does it matter whether the answer to an inquiry derives from records stored in a data base or from re-enacting an experiment? What significant difference is there for the user of logarithms between a logarithmic value read from a table of logarithms and a logarithmic value newly calculated as and when needed? The inquirer might be wise to compare the two, but would surely regard both as being equally information. Indeed it would be a logical development of current trends in the use of computers to expect a blurring of the distinction between the retrieval of the results of old analyses and the presentation of the results of a fresh analysis.

To include objects and events, as well as data and documents, as species of information is to adopt a broader concept than is common. However, if we are to define information in terms of the potential for the process of informing, i.e. as evidence, there would seem no adequate ground for restricting what is included to processed data and documents as some would prefer, e.g. by defining information as "Data processed and assembled into a meaningful form." (Meadows, 1984, p. 105). There are two difficulties with such a restricted definition:

Firstly, it leaves unanswered the question of what to call other informative things, such as fossils, footprints, and screams of terror. Secondly, it adds the additional question of how much processing and/or assembling is needed for data to be called information. In addition to these two specific difficulties there is the more general criterion that, all things being equal, a simpler solution is to be preferred to a more complicated one. Therefore we retain our simpler view of "information-as-thing" as being tantamount to physical evidence: Whatever thing one might learn from (cf. Orna & Pettit, 1980, p. 3). Fortunately there are moves in the English-language literature of information retrieval toward a more ecumenical approach to information and information systems (Bearman, 1989).