HOME
PREVIOUS PAGE
NEXT PAGE


II COMPUTER-BASED LEGAL INFORMATION RETRIEVAL - A HISTORICAL SURVEY

[Page 55 ]


4 Beginnings

4.1 INTRODUCTION

Computers have now been used in legal information systems for more than 15 years. This may be too short a time to justify the use of a dignified word like "history". But these years have been hectic; a number of projects have been launched, a number of systems have been created. The literature makes frequent references to systems or projects, but it is often difficult to relate these to each other. To most people the picture of the development remains an unsolved puzzle.

We have ourselves faced this difficulty in our attempt to sketch a coherent, although stylized, picture of the development. We have had the opportunity to draw on a great number of sources, but the documentation is still unsatisfactory. For one thing, Norway has been rather remote from the center of the activities. Also, documentation often comes in the form of informal reports etc. which are not easily available.

There may therefore be lacunae in our exposé, but we still think it may serve a useful function with respect to the more general discussions in this book. It may also serve as some sort of explanation of all the acronyms used as names on projects or systems, and which may be unfamiliar to the reader.

Actually it is difficult to distinguish between projects and systems. In some cases, a general text retrieval program is available from a software manufacturer, and this program is used in a number of projects. In other cases, the project is mainly concerned with the development (and marketing) of a system.

4.2 BACKGROUND

4.2.1 Libraries and indexes

As we have pointed out above, legal reference retrieval corresponds to traditional legal research. Lawyers have traditional tools to assist them

[Page 57 ]


in this research. The most important is, of course, the law library. The law librarian has a number of methods of finding the material that the user has requested, for instance catalogues organized systematically or alphabetically (according to author, title, key-word, etc.).

The accumulation of volumes in American law libraries seems to have increased drastically around 1930 (Allen/Brooks/James 1962:22, Lawlor 1962:300-305). A few years later the application of computers to administrative systems became a realistic alternative. Computer-based library systems were created (cfr. Hayes/Becker 1970) and were also as a matter of course considered for law libraries (cfr. for instance Allen/Brooks/James 1962:23-81, Veaner 1971).

The problems of library systems are not, however, identical with the problems of reference retrieval. A library may use a computer-based system for the administration of their collection of books: which books are lent, to whom, etc. Also, queries will often be formulated in such a way that fact retrieval - not reference retrieval - will be the adequate response: do you have a book by this author, a book with such and such title? etc.

The catalogues and indexes of a library will, however, often be used as a reference retrieval system. A query must then be formulated to correspond with the search criteria in the catalogue or index: a systematic number, name of an author, an indexing term, etc. An obvious step would be to convert the manual catalogues and indexes to an automatic system.

Actually special indexes have been developed to satisfy the need of the lawyer. Most legal publications (for instance compilations of statutes and law reports) have extensive and well-developed indexes. Such indexes may in fact be publications in their own right, covering for instance the decisions by a certain court over a certain period of time, etc.

In Norway, which certainly is not very well developed in this respect, there has, for instance, been published an index of all statutes cited in cases published in the major law reports.

In USA, where case law plays an important role in the legal system, a number of indexes, digests, etc. have been developed which aid the lawyer in his search for precedents. The major example is the Key Classification System, developed by West Publishing Co. This system includes 420 subjects, which have their own term and key. Each subject is in turn divided into parts and numbers. The subject "Contract" is for instance divided into 6 parts and 356 Key Numbers.

[Page 58 ]


An example (given by Lando 1968:175-176): Under the subject "Contracts", Part II "Construction & Operation", and the Key Number 144 "What Law Governs" several cases are cited, one example is

"N.Y. Under 'center of gravity' or 'grouping of contracts' theory of Conflicts of Laws, the court lays emphasis on law of place which has most significant contacts with matter in dispute rather than place of making or performing the contract thereby giving place having most interest in the problem paramount control over legal issues arising, Auten v. Auten 124 N.E. 2d 99, 308 N.Y. 155."

The source itself (the case cited) must be consulted if the user wants to control the interpretation in the digest or make more extensive use of the case.

One might have thought that indexes like the Key Classification System would have been transmuted into computer-based retrieval systems. Within other fields, we have seen such a development - for instance the development of Index Medicus into MEDLARS and MEDLINE (cfr. for instance Fagerheim 1972).

Circumstances did, however, pick another point of departure.

4.2.2 Jurimetrics

The word jurimetrics was coined by Lee Loevinger in his now classical article "Jurimetrics - The Next Step Forward", a term he defines as "the scientific investigation of legal problems" (1949:31). In his survey of the problems of jurimetrics (1949:32-36) he mainly advocated an empirical approach to legal problems - and many of these problems are today included in the sociology of law (for instance the behavior of judges).

Loevinger's paper does, however, serve as an exponent of the interest lawyers were beginning to take in logic and the law. This interest is not of a recent date. George Boole, the inventor of Boolean logic (which is basic to most text retrieval systems, cfr. below at section 10.5.5), used a problem in Jewish dietary law as a simple example in his An Investigation of the Laws of Thought (1854), and the problems of logic and law have been discussed in legal philosophy (cfr. the historical survey and discussion in Sundby 1974:50-76). The use of computers, which demands a high degree of discipline and logic from programmers, did however, in one sense give the problems of logic and law a new dimension: deontic logic may in fact be translated into computer programs, and the possibility of constructing decision-automats ("deontic machines") was created. A number of papers discussing the application of modern logic to law were published (cfr. the survey of Lawlor 1962:305-310). Especially important for the later development were the papers published by Layman E. Allen, commencing with his article in Yale Law Journal in 1957 "Symbolic logic:

[Page 59 ]


A razor-edged tool for drafting and interpreting legal documents". In 1959 Allen started editing the magazine named M.U.L.L. (Modern Uses of Logic in Law), which was published by the Electronic Data Retrieval Committee established by the American Bar Association in the same year, now the Standing Committee on Law and Technology. This journal was dedicated to papers discussing the application of the scientific method to law, and has played an important part as a vessel for creating interest also for computers and law (cfr. Tapper 1963:132). M.U.L.L. is still published under the title Jurimetrics Journal.

With respect to the development of computer-based retrieval systems, the development so briefly sketched above has only derived interest. But we believe it was important in the way the attention of lawyers was turned toward "the application of the scientific method to law". One of the exciting, new tools science offered the post-war society was the electronic computer. And the application of the computer to law was motivated in part by the work of Allen, Loevinger, and others. But the computer was not put to use as a "deontic machine", but rather as a research tool; a distinction we stressed above in section 2.3.

The first modern, electronic computer - ENIAC - was operational in the summer of 1946. That same year Lewis 0. Kelso pointed out1 :

1 "Does the law need a technical revolution?" in Rocky Mountain Law Review 1946:378 - cited after Lawlor 1962:310.

"Today the lawyer works substantially as he worked before the industrial revolution. Only automated legal research will save him from playing one of the most confused, ill-paid and unsatisfactory professions in the world of tomorrow."

Kelso's proposal is believed to be the earliest suggestion for creating automatic retrieval systems to assist legal research. His suggestion was stimulated by the work of Dr. Vannevar Bush, who had advocated mechanical searching methods in scientific fields, and Kelso suggested a "Law-dex"-system based on the use of punched cards.

Another suggestion was made in 1955 by Vincent P. Biunno of the New Jersey Law Institute (Lawlor 1962:311). He proposed to enter legal information on a tape which was to be moving continuously past a number of read-out stations. The information might then be retrieved by different lawyers more or less at the same time.

These examples are not mere curiosities, they illustrate the fact that lawyers at that time were looking for new ways to ease the workload represented by legal research. This interest - and the need that created it

[Page 60 ]


- would fuse with some of the ideas generated by jurimetrics: the first computer-based legal information system, working on full text and allowing queries to be formulated in Boolean logic, would be created.

An early description of a computer-based on-line information system is given in the short-story "How-2" by the American science fiction writer Clifford D. Simak. The story, which was published in Galaxy 1954, tells of a lawyer who one morning discovers a box with a do-it-yourself kit inside. Following the instructions for use, he builds himself a robot - which happens to be mis-delivered from the future. The following mixup sends the lawyer before the court. But his friendly robot spends the night before the trial building a new robot - a lawyer-robot.

'"(A lawyer robot) with a far grater memory capacity than any of the others and with a brain-computer that operates on logic. That's what law is, isn'i it - logic?'

'I suppose it is,' said Lee. 'At least it's supposed to be... It just wouldn't work. To practice law, you must be admitted to the bar. To be admitted to the bar, you must have a degree in law and pass an examination and, although there's never been an occasion to establish a precedent, I suspect the applicant must be human.'...

'All they'd need to do would be read the books,' said Albert. Ten seconds to a page or so. Everything they read would be stored in their memory cells.'...

Lee scrubbed his chin with a knotted fist and the light of speculation was growing in his eyes. 'It might be worth a try. If it works, though, it'll be an evil day for jurisprudence.'"

Whether Lee's last words are prophetic or not, still has to be seen.

4.3 THE INTIATIVE

4.3.1 The Pittsburgh Project

The University of Pittsburgh started developing a Data Processing and Computing Center in 1955. In the course of four years, systems for information retrieval had become an integral part of the Center, which at that time had changed from an IBM 650 to an IBM 7070 and a supplementary IBM 1401 (Asher/Kurfeerst 1963:4).

At approximately the same time, Professors John F. Horty and William B. Kehl started a project in the Graduate School of Public Health, designed to study and improve the health statutes of Pennsylvania (Asher/Kurfeerst 1963:4). In 1956 the Health Law Center under the direction of Horty undertook the writing of a manual on the subject of hospital law. They looked at material from several states, and found that there was little uniformity in indexing from state to state. Therefore special indexes had to be developed. Problems increased as the project moved into wider areas of "health law", and the project looked to the Computing Center of the University for a solution (Loevinger 1963:10).

A special assignment proved to be a kind of turning point. A state

[Page 61 ]


legislator in Pennsylvania had a bill passed to change the expression "retarded child "to "exceptional child". In order to implement the bill, all instances where the expression occurred, had to be located.

This type of revision of the statutes is certainly trivial, but it is nevertheless time-consuming to look for all occurrences of the relevant expression. Such changes seem to be necessary from time to time - recent examples from Norway are the change of the expression for "lawyer "("sakfører" was changed to "advokat"), and for "social security office" ("trygdekasse" was changed to "trygdekontor"). Cfr. Lovteknikk 1971:16.

The Health Law Center started out to solve this problem in the traditional way; they paid a group of students to read through the statutes and make a note of all occurrences of the relevant expressions. It turned out that the inaccuracy was too high to be acceptable - another group of students were hired to reread the material. Still there were errors.

A more radical method was then adopted. The entire material was registered on punch cards and verified by doublepunch. When a machine-readable copy of the material was established, it became a trivial task for the computer to read through the material and retrieve all occurrences where the word "retarded" preceded the word "child" or variations of "child" (Link 1973:6-7).

The result was not only a satisfactory solution to the original assignment ; as a by-product, the Health Law Center got the full text of the statutes in machine-readable form. And Horty found other and more exciting ways of exploiting this material. Actually it was the beginning of the full-text retrieval systems which today are predominant in computer-based legal information retrieval.

Horty was not the only one at this time with an interest in computer applications to legal research. For instance, conventional book indexes were converted into machine-readable form (Link 1973:7), and twenty years of headnotes of design patent law cases had been stored on the magnetic discs of an IBM RAMRAC 305 for retrieval purposes, an experiment carried out by Donald A. Andrews, who at that time was Director of Research and Development of the United States Patent Office (Lawlor 62:313, Tapper 1963:133). Other projects are briefly mentioned by Loevinger 1963:13-21.

The full-text approach did, however, entail some advantages for legal research - one being the appeal to lawyers, who could conduct their retrieval against the sources themselves, not against an index or an abstract. At this time, the relative performance of different methods was not very well known, but there was a general attitude of optimism in

[Page 62 ]


regard to the full-text approach, a hope that the potential of this method might be higher than the index or abstract approach.

We also think that the retrieval strategies developed by Horty in order to overcome the specificity of the full text may to some degree have caused this attitude. Horty suggested that a question should be divided into "concepts", and that each "concept" should be described by a list of search-words. (Cfr. examples in Horty 1960:3-5, Lawlor 1962:315-320, Loevinger 1963:11-13.) This strategy does have some important traits in common with the conceptor-based retrieval strategy discussed in section 10.5.5 as developed by the Norwegian Research Center for Computers and Law.

Loevinger (1963:11-12) discusses an example where a query is constructed for the question: "What are the rights of illegitimate children and the duties owed to them by their parents under Pennsylvania Statutes?" This question is split into three "concepts", the first being 'baby', the second "parent' and the third 'illegitimate'. Loevinger lists the search-words used to describe each of these "concepts" for 'baby' he lists:

  • baby
  • babies
  • child
  • children
  • foundling
  • infant
  • minor
  • offspring

All these search-words may be used in the query combined with the Boolean operator or, and the three lists of search-words (where all words in the list are linked with or) are combined with the Boolean operator and. Documents which contain at least one word from each of the three lists are retrieved.

It may be noticed that the words occurring in one list are mainly synonyms. But they are only synonyms in the context of the question. As words representing 'parent' Loevinger for instance lists both "mother" and "father", two words which in another context (for instance in a discussion of sex roles) certainly will not be synonyms. Such words, which may be considered equivalent for search purposes, have been termed searchonyms by Richard P. C. Hayden (cfr. Lawlor 1962:318).

The batch-oriented system for storage and retrieval of statutes was named the "Key Word in Combination"1 approach by Horty (Loevinger 1963:12), and was successfully demonstrated at an American Bar Association conference in 1960.

1 KWIC is more commonly used to refer to a Key-Words-In-Context listing (first developed by H. P. Luhn, IBM), which is an output or index format.

[Page 63 ]


4.3.2 Aspen Systems Corporation

The system developed at the Health Law Center was in 1968 made into the nucleus of a private company, Automated Law Searching - later renamed Aspen Systems Corporation (Aitken/Campbell/Morgan 1972:31, Prestel 1971:19). Aspen rapidly expanded and diversified its activities, it converted into machine-readable form the United States Code and the statutes of all states; in addition regulatory law, local ordinances, case law etc.

Among the services offered by Aspen were also a bill-drafting service (QUIK-DRAFT), bound KWIC-indexes, production of statutes in force in printed form, etc. Aspen acquired as a subsidiary Computype Corporation (St. Paul, Minnesota) in order to utilize the machine-readable material for photo-composition and typesetting (Tapper 1970:22-23).

The first to have emphasized the close relationship between a legal retrieval system and a system for printing legal material, seems to have been Professor John Lyons of George Washington University. In 1966 he founded Autocomp Inc. mainly to develop and combine full-text retrieval and photo-composition techniques. (Cfr. Prestel 1971:21, Tapper 1973:186-188.) A brief description is given by Lyons 1969:60-63.

Aspen started as a corporation with special know-how of full-text retrieval techniques, acquired through the early experiments at Pittsburgh University. Other systems were launched by Aspen (for instance LITE, cfr. below), and Aspen also lent a hand in the development of integrated retrieval and publishing systems. By 1970 Aspen was providing various legal information services to 20 states, and in addition serving a number of public agencies and private corperations, cfr. Tapper 1973:89. The terminal-oriented full-text retrieval systems developed during the last part of the 1960s seem to have reduced the importance of Aspen's retrieval services - but there was still a market for its services in the field of publishing of legal material, especially annotated statutes or codes.

In the early 1970s, Aspen was acquired as a subsidiary of American Can Co., and Horty left Aspen and returned to private practice (Aitken/Campbell/Morgan 1972:31). Horty's early experiments resulted in a decade of activity and exploration. It is rather fitting that this epoch should end with the divorce of Aspen and Horty.

4.3.3 From LITE to FLITE

One of the most important projects sired by Professor Horty and Aspen, is the LITE system.

As early as 1961, the Office of the Staff Judge Advocate at the Air Force

[Page 64 ]


Accounting and Finance Center began to look into the possibilities of using computer-based retrieval systems. In August 1962 a proposal was made to the Headquarters USAF, recommending the conversion from manual to computer-based research. The Judge Advocate General also recommended that such a system should be based on the full text of the documents. The proposal was approved, and the Office of the Staff Judge Advocate, Air Force Accounting and Finance Center, was given the responsibility for the development and testing of the LITE system (Air Force Letter No. 177-6, 13. November 1963).

The name of the system - LITE - is an acronym for Legal Information Thru Electronics.

Originally the data base of the LITE system consisted of statutory and regulatory material and some case law. During the years of operation, additional material has been made available and around 1970 the LITE data base was probably the largest collection of machine-readable legal sources in existence (Tapper 1970:26). Later, however, other systems (most notably the LEXIS system) have eclipsed LITE in this respect. In 1974 the total data base comprised some 106 600 000 text words.

LITE was launched from the pad prepared by Aspen, but has - during the years of operation - been modified. The search strategies are, however, still based on Boolean and positional logic (cfr. Rognlien 1971:25-29), and the system is still batch-oriented, normal response time being one day to run the query and additional time for communicating with the LITE center at Denver, Colorado. The users are almost exclusively within the Defence Department (though the system in principle is available to the Congress, the President's Office, and the Supreme Court). The users are scattered over a wide geographical area, making conversion to a terminal-oriented system impractical at present - though terminals will probably be installed at the Center in Denver.

At an early stage, LITE was put through a series of tests aimed at assessing the effectiveness of the system. The test lasted six months in 1964 and was conducted on a data base of approximately 17 million words. Queries processed by LITE were compared with the results of parallel manual research by the users - amounting to a total of 215 separate questions. The result of the test has been presented as a confrontation between man and the computer:

  • in 7.5 per cent of the total searches the computer retrieved fewer relevant citations than were discovered by manual research

[Page 65]


  • in 44.1 per cent of the total searches the same number of relevant citations were retrieved by both methods
  • in 48.4 per cent the LITE system retrieved more relevant citations than were discovered manually.

Summing up the result, Davis (1966:9) argues that the LITE system worked on a 92.5 per cent effectiveness rate (the ratio at which computer-based retrieval equaled or transcended manual retrieval), while manual retrieval worked at a 51.6 per cent effectiveness rate.

The results, as presented here and by Davis, should be interpreted in a critical perspective. In the test, one seems to have focused on the recall capabilities of the system, while the precision capabilities have not been illuminated. Other variables may also have influenced the result, such as different coverage of the data base available to the LITE system and the manual researchers. The recall capabilities of LITE are, however, well documented. In a test the Bureau of Budget processed a query to which 137 citations were deemed relevant. Of these, LITE retrieved 128 (93.5 per cent), while by manual methods 85 (or 61 per cent) were retrieved. Tapper (1974:9) reports that an internal test has shown that the average LITE search corresponds to 18.5 man/hours manual search - making the cost of a LITE search $ 23 compared to $ 160 for the corresponding manual search.

The LITE system has also been used for production of hard copy indexes, selective distribution of information, etc. It may serve as an example of a project created out of the need for legal information retrieval, but serving as the pivot in a legal information system where the processing of queries is only part of the service offered. Tapper states that the production of indexes and SDI-services has in many ways become more essential to the users than the processing of individual queries (1970:26). In fact, one may argue that a KWIC hardcopy index produced by LITE is a small, cheap, portable extension of the system.

In 1969 LITE processed 2.682 searches. The number of queries processed has snowballed over the years; in 1973 it was 21.558 and was expected to reach 30.780 in 1974. The charge per query was in 1974 $ 50, the total budget $ 650.000, and the breakeven point consequently 15.000 paying searchers per year (the charge is waived for users outside the Defence Department). Cfr. Tapper 1974:9.

During 1973 the whole operation of the LITE system was reviewed by a special committee set up by the Committee on Appropriations of the House of Representatives. This committee reported back on 26 October 1973, recommending the continuance of the system.

One of the results of this review was the re-naming of the system. It is now known as FLITE (Federal Legal Information Through Electronics), cfr. L&CT 1975:27.

[Page 66 ]


The FLITE system has the longest period of practical operation of all systems in North America. Consequently the literature on this system is overwhelming - probably more has been written on LITE (FLITE) than on any other legal retrieval system. A selected bibliography may be found in footnote 65 to Robert P. Bigelow's article "The Use of Computers in the Law", L&CT 1975:105. Scandinavian readers are referred to Rognlien 1971. A special issue of the JAG Law Review of November-December 1966 presents the LITE system; a second edition of this issue was reprinted 1 July 1969 with some additional material.

4.3.4 The Oxford Experiments

The results published of the Pittsburgh Project had a stimulating effect not only on lawyers in he United States, but also overseas. Starting from the results published by Horty, the English lawyer Colin Tapper (now lecturer in law at Magdalen College, Oxford - then connected with the London School of Economics and Political Science) entered upon his series of experiments with case law retrieval - which has become known as the Oxford Experiments (cfr. Tapper 1973:176-182 for his own summary of these experiments and Haft 1970:143-144).

The experiments focused on retrieval problems in regard to case law material. Horty's early work had mainly been restricted to statutory material, and there are some distinct differences in legislative and judicial texts which do cause somewhat different problems. For instance the language of case law is more redundant, the obvious document unit (one decision) is not as semantically homogeneous as the obvious document unit of statutes (one article) etc.

The value of Tapper's experiments lies in two dimensions: firstly, the results he arrived at are of importance in themselves. Secondly - and in retrospect more important - the attitude Tapper brought to the field: the attitude of a critical and scientific inquirer who examined the basic problems of designing reliable tests for information system performance.

His dismissal of using manually compiled indexes may serve as an example: when creating the indexes, sampling proved that the inconsistency both between different indexers indexing the same decision and the same indexer indexing the same decision at different times and in different contexts was too great to give a satisfactory basis for performance tests - cfr. Tapper 1973:121. Another example is on what basis the comparison should be made. The coverage of the data base in a computer-based information system may be specified precisely, while this seldom is so in respect of a conventional search situation. If the manual search is restricted to the legal sources made available through the computer-based system, the character of the conventional search is completely altered - and the comparison will be of a correspondingly reduced value. Cfr. Tapper 1973:169-170.

[Page 67 ]


The Oxford experiment situation was complicated by the practical problems that text processing posed in relation to the computer equipment available in the early sixties. The main aims of Tapper's experiments were to compare the performance of computer-based and manual retrieval, and system performance using full text and abstracts as a data base.

Taking average recall and precision of all the questions processed in the experiments, Tapper found that of a total of 175 relevant documents, 100 (49 per cent) were retrieved by conventional methods, while 144 (70 per cent) were retrieved by the computer. On the other hand, by conventional methods altogether 109 documents were retrieved, of which 100 were relevant (92 per cent), while 496 documents were retrieved by the computer, of which 144 (29 per cent) were relevant. (Cfr. Tapper 1973:179-182 for comments and analysis of these results.) When these figures are compared with the figures given for the performance of LITE under section 4.3.3, the computer-based retrieval may appear less satisfactory.

One of the problematic aspects of the estimates of performance as stated above is the establishment of "relevant" documents, a problem discussed below at section 9.4. In his experiment, Tapper was able to establish relevance criteria completely independent of the experimental environment using citations - for instance: a question is framed on the basis of decision A. Then both decision A and decisions citing A or cited by A will be relevant in respect of the question. Cfr. Tapper 1973:168, 176-177.

4.4 A PROFUSION OF PROJECTS

4.4.1 Introduction

In the previous sections, we have sketched a few outlines, suggesting how legal information systems were initiated and developed in the United States. These outlines do not, however, to any considerable extent describe the profusion of projects which followed in the tracks of Horty and other pioneers of the late 1950s.

Tapper reports in 1963:132 that 28 different projects were running concurrently in the field of legal information systems. A survey of the World Peace Through Law Center in 1966 lists about 40 projects, cfr. Prestel 1971:17. Several American universities initiated projects within the field of legal informatics - for instance the George Washington University (Professor John Lyons), the Western Reserve University, Stanford University (which since 1971 has a Law and Computer Fellow under the Computer Study and Research Program, cfr. JJ 1971:113), etc.

[Page 68 ]


In 1967 the third World Conference on World Peace Through Law was arranged in Geneva, featuring an exhibition on computers and law, and sessions on legal information retrieval. The World Peace Through Law Center reported on this conference in its first issue of the journal Law and Computer Technology (January 1968) - a welcome addition to the Jurimetrics Journal. In 1970 a third American journal appeared, Rutgers Journal of Computers and the Law, published by Rutgers University.

We will not, in this book, attempt to give an adequate presentation of the development in the United States, as it is evidently too diversified to be compressed into a few paragraphs. Also, a discussion of these projects would presuppose a somewhat broader perspective than is to be found in this book. Legal informatics includes systems other than legal information retrieval systems - for instance legislative information systems and court administration systems, systems for litigation, office management, etc. We will restrict our discussion to highlighting a few examples.

More comprehensive surveys are given by Dickerson 1968, Tapper 1970 and 1974, Prestel 1971, Haft 1970:122-163, and May 1973 - the last being a collection of presentations delivered at the First National Conference on Automated Law Research (Atlanta 1972).

4.4.2 Law Research Services, Incorporated

In 1960, a New York lawyer, Elias Hoppenfeld, with some background in computer technology, began planning for a computer-based legal retrieval system. In spite of the successful results Professor Horty had obtained with full-text systems, Hoppenfeld opted for an indexing system. This is probably partly due to the fact that while Horty initially was oriented toward statutes, Hoppenfeld was oriented toward the far more bulky case law.

Law Research Services was founded by Hoppenfeld in 1963, and it was very much a one-man business. The retrieval system became operative in February 1964 - probably the first commercial legal retrieval service to be launched. In March 1965 shares in the company were offered to the public, and all stock was sold out in a few days.

The data base of LRS was cases represented by a set of manually selected indexing terms. The user filled in a form, writing down his question in his own words. This question was then forwarded to LRS, where an editor (with a background knowledge in the relevant field of law) formulated the query. Next the query was processed, and citations to the retrieved cases printed out. This list was then edited before being

[Page 69 ]


returned to the user - the editor indicating which three cases he thought most pertinent to the user's question (a sort of manual ranking). Full text of these cases was provided in microform - and the user could have the other cases in full-text on request.

The initial system was based on a strong interaction within the LRS between lawyers and the computer system. The computer did all the routine work, while the expertise of lawyers was used to refine both query and response. A similar approach was later adopted by the Belgian system CREDOC, cfr. below in section 5.2.

In order to make closer contact between the user and the LRS, direct access was provided through rented tele-type terminals. When the preediting stage was eliminated, the user had to be provided with the means of formulating his query by using the indexing-terms adopted by LRS. In order to facilitate this, several manual indexes were produced.

Initially the cost for using the system was 23 dollars per question. When terminals were introduced, the cost was cut down to 10 dollars per question, a line charge of 2 dollars, and a monthly rent of 18 dollars, cfr. Tapper 1973:185 and Haft 1970:169 at note 711. It is reported that the majority of law firms in the state of New York at one time were subscribers to LRS (Tapper 1973:185); in 1964 there were more than 5 000 clients throughout the country (Seipel 1970:726), and plans for expansion to other states were considered.

LRS did, however, meet with heavy difficulties. Computer facilities were originally provided by Sperry Rand (Univac III), but the relationship between Sperry Rand and LRS was marred by financial difficulties (2 CLSR 309 at page 313-315). Arrangements were made with Western Union Tel. Co. (1963-1966) to create and maintain a data base and provide communication facilities, cfr. Seipel 1970:326. But also this relationship was marred by misunderstandings, cfr. 1 CLSR 1002 (Law Research Inc. v. Western Union Tel. Co., 27. May 1968) and 3 CLSR 161 (Law Research Service of Missouri, Inc. v. Western Union Tel. Co., 7. December 1971, cfr. 336 F. Suppl. 510).

The difficulties, as reported in 1 CLSR 1002, partly arose out of misunderstandings concerning the use of an algorithm for generating octal digits employed when addressing the fastrand drums, and as index numbers for the citation data in LRS' thesauri. Also billing tapes were not as required by LRS.

And more trouble lay ahead. A suit was brought against LRS, based on the claim that information offered by LRS at the time public shares were

[Page 70 ]


issued, had been misleading. In fact the old trouble with Sperry Rand was at the back of this matter, as the disagreement with Sperry Rand had been understated in the information to prospective shareholders. Cfr. 2 CLSR 309 (Globus v. Law Research Service, Inc., 24. February 1970) cfr. 287 F. Supp. 188 (Morton Globus et. al. v. Law Research Service Inc. and Blair & Co. v. Paul Wiener, 2. July 1968).

Lastly, West Publishing Company brought a suit in June 1966, alleging that LRS had copied certain of the West key number digest indexes for their computer-based retrieval system, thereby infringing various West copyrights, cfr. Duncan/Peck 1972:99, and 2 CLSR 984 at page 985. A settlement was reached during LRS bankruptey proceedings, in which all claims were dismissed, and it was stipulated that all the material of LRS claimed to represent a copyright infringement should be destroyed - including the data bank itself. It ought to be stressed that LRS did not admit to copyright infringement, and that this claim also was dismissed as part of the settlement. Furthermore the settlement stipulated that West "shall not create and offer for sale to the Bar or to the public a computerized system for legal research identical to or substantially identical to" the LRS system documented during the proceedings. West was not barred, however, from creating an independent computer-based system for legal research "without using any confidential information obtained from the documents or depositions" made available to West during the proceedings. Cfr. 3 CLSR 561 (West Publishing co. v. Law Research Service, Inc., 25 May 1972).

A related case is Computer Searching Service Corp. v. Ryan 2 CLSR 984, 2 March 1971. This is an action taken by West against a subsidiary of LRS mainly owned by LRS itself. West claimed that CSSC was "threatening to take over the infringing acts" of LRS.

In spite of the misfortunes descending on the first commercial venture in this field, there are persistent rumors of LRS once more moving into the market, cfr. Tapper 1974:23. And its adversary, West Publishing Company, has also recently offered a computer-based legal retrieval service (WESTLAW) - based on the Canadian QL system, cfr. below in section 6.3.

4.4.3 RIRA: Reports and Information Retrieval Activity

RIRA did in fact orginate within the Internal Revenue Service in 1962, and the departure point was the need for a systematic co-ordination of the decisions made by lawyers within the IRS. The Office of Chief

[Page 71 ]


Counsel of IRS employs about 650 lawyers (Cohen/Uretz 1968:2); more than half of these are stationed at 35 field offices throughout the country. A number of statutes are administrated by this organization; attorneys of the IRS have for instance exclusive jurisdiction in the preparation and trial of cases filed with the Tax Court.

Tax law is of a rather special nature. In most countries - even where the legal system depends heavily upon case law - the tax law is codified, and tax can usually only be levied with direct reference to a statute. Also, tax law cases may be rather intricate; as Tapper (1973:201) has put it, one tries to "exploit quite deliberately any inconsistencies, ambiguities or loopholes. Thus litigation is heavily concentrated on fringe situations. Large sums of money are usually involved, so the very greatest legal ingenuity can be purchased on both sides..." In addition, decisions made by the tax administration ought to meet the standards raised by the principle of equality, and which confront any public administration. As Cohen/Uretz (1968:3) point out, the IRS "cannot be concerned simply with taking the position which gives the best chance of winning the particular case. One of the major objectives of the Service in administrating the tax law is uniformity and consistency of treatment among similarly situated taxpayers". We think this is an important observation, and it characterizes a typical difference in the information needs of a private and a public lawyer. This difference may indeed be one of principle, corresponding to a difference in the standards to be met by legal information systems inside and outside the administration.

The RIRA system was designed to solve the co-ordination problem between the separate offices and lawyers of IRS. The data base consists not only of legal precedents (closed cases), but also of pending cases. It is not known how many cases are at any one time included in the RIRA data base, but it is stated that at any given time, about 14 000 pending cases are entered into the system (Cohen/Uretz 1968:3, Duncan/Peck 1972:26).

Documents are stored as a set of indexing terms. Before the introduction of RIRA, there existed no standard terminology adequately describing the legal problems of a case. A thesaurus was developed, called the Uniform Issue List. This is largely based upon the Internal Revenue Code, and we see that the intimate relationship between the code and the legal problems characteristic of tax law has been exploited in designing a relatively simple, but unambiguous and adequate, indexing language. Also, the indexing terms will be easy to use for lawyers who are familiar with the code.

[Page 72 ]


The final index consists of some 6 000 separate subjects, and is produced in two formats: the Uniform Issue List and a KWIC-format of the same list. The KWIC index gives an alphabetic cross-reference index to the Uniform Issue List and the individual tax cases, cfr. Duncan/Peck 1972:23, Seipel 1970:276-277.

Each subject is identified by 8 digits and a character string of maximum 55 letters:
1-4 IRC breakdown
5-6 Major breakdown
7-8 Sub-breakdown
9-63 Brief word description

The IRC sections do not extend beyond the 8 000-series, and the 9 000-series are consequently employed to represent issues not covered by the code. Cfr. Cohen/Uretz 1968:4, Duncan/Peck 1972:23.

Documents in the RIRA data base are identified with a unique 11 digits code, constructed as prefixes and suffixes to the docket number:
1 - Code indicating whether case is before court of original jurisdiction, Court of Appeals or Supreme Court
2 - Reopening code (0 = original, 1 = reopened once etc.)
3 - Function code (Tax Court, refund suit etc.)
4-9 - Officially assigned docket or jacket number
10-11 - Year case originated

(Cohen/Uretz 1968:5, Duncan/Peck 1972:23)

Somebody has to do the indexing work. Usually this is done by special indexers, but RIRA has opted for a simpler system: the lawyers handling the case do the indexing themselves. This is done on a form provided by the system, and updating is done monthly on a computer-generated turnaround report, cfr. Cohen/Uretz 1968:5, Seipel 1970:276.

In addition, an "abstract "of each case is prepared by the attorney. This is a description of the case and the positions taken by the Government and the taxpayer. This abstract is also updated when relevant changes take place. The abstract is not converted into machine-readable form, but is micro filmed. When filmed, it is given a 3 digit reel number and this (along with the number of the frame of the first page of the abstract) becomes the "microfilm access number", and is connected to the case number in the indexes. Microfilm reader-printers are provided at all offices, and hard copies of the microfilmed abstracts can be obtained in a matter of seconds.

Retrieval in the RIRA system pivots on the manual indexes. The user may as retrieval criteria use either one of the codes from the Uniform Issue List, or he may choose words from the KWIC index. In this way,

[Page 73 ]


he may isolate possible relevant cases, and will consult the microfilmed abstract to assess their relevance and obtain information which is sufficient for his use.

The abstracts are also prepared by the attorneys handling the case. This has several advantages: the user already has a deep insight into the case, and can with relatively little effort produce an adequate abstract. This solution is especially acceptable where no or few formal norms are laid down for writing the abstract (norms such as defined language, defined notation, etc.).

The intermarriage of computer and microfilm techniques makes RIRA an interesting hybrid system, characterized by the simple indexing language, which relies heavily on the structure of the Internal Revenue Code, and the replacement of professional indexers with the users themselves. We think this is a solution very well suited to an environment where the users are disciplined through their legal training to relate cases to sections in a statute. The solution of access to full text is also interesting: the abstract is stored in microform, and the address to this abstract is contained in the computer-produced indexes. As costs of computer storage and transmission come down, this solution may become less attractive, and one will probably prefer to store the abstracts in machinereadable form and have access to them through terminals. The RIRA design does, however, have interesting qualities, providing great flexibility at no great cost, and providing a nation-wide information service by using the computer to produce manual indexes.

4.4.4 JURIS: Justice Retrieval and Inquiry System

The JURIS project was initiated in order to give the attorneys of the United States Department of Justice better possibilities for conducting legal research. Introducing the system, Kondos (1971:147) points out that there are certain "minimal standards of due process and equal protection of law" to be afforded all United States citizens, and he goes on to state that often "fulfilment of these requirements depends on timely access to reliable and up-to-date information". The justification of creating a better information system is, as these remarks illustrate, not mainly a question of economy, but of the rule of law. As in the case of the Internal Revenue Service's system RIRA, JURIS was created in order to overcome the availability factors constricting the access to relevant legal sources in the existing information system. Users of JURIS are the over 2 500 attorneys in the Department of

[Page 74 ]


Justice, functioning as lawyers, and others needing access to legal sources in other capacities, including U.S. attorneys and their staff in the 93 judicial districts throughout the country.

The objective of JURIS is to make available to these users legal material generated within the department itself, cfr. Basheer 1973:55. This material consists mainly of legal briefs, memoranda, legal handbooks, form books, appellate briefs, legal procedure and policy correspondence, summaries of significant reported decisions, selected legal periodicals, case file intelligence, and evidential material for protracted cases - cfr. Kondos 1971:148, 1973:2-1. The material exemplified here falls within the category of administrative decisions and opinions - a legal source of great importance within public administration, and of vital importance as regards the principle of equality. In addition to this, the total text of the United States Code is stored in the system. For a more complete survey, cfr. Kondos 1973:4- 1.

It is obvious that the data gathering and registration of such a conglomerate of legal sources give rise to problems. To select significant documents, individuals in each section of each litigating division are appointed. The selected documents are indexed and abstracted, when possible by the author of the document. The document is then typed on a typewriter which produces the text in machine-readable form, cfr. Basheer 1973:63. At least in once instance, documents (i.e. the Solicitor General's brief to the Supreme Court) are captured in conjunction with a computer-based photo-composition system, cfr. Kondos 1973:3-1.

Although our description is presented as of an existing and operative system, the discussion by both Kondos 1973 and Basheer 1973 is conducted in a conditional perspective.

JURIS is an on-line terminal-oriented system, cfr. Tapper 1973:204; the first terminals in the main Justice Department building were planned for installation in the autumn 1972. Installation of terminals in the remaining U.S. attorneys' offices was scheduled for the second half of 1973, cfr. Basheer 1973:57. Tapper (1974:11) reports that the present system can support up to 100 terminals. Apart from the Department of Justice, other central administrative offices have made considerable use of the system, cfr. Kondos 1974.

In JURIS we see a change in technique compared to RIRA - microform storage is to a great extent replaced by computer storage, and computer-produced manual indexes for searching purposes are replaced by on-line terminals.

[Page 75 ]


The retrieval program is based on an adaptation of the program developed by the National Aeronautic and Space Administration (NASA) for their Remote Console (RECON) System, cfr. Lancaster/Fayen 1973:103-109, Basheer 1973:56. This is an advanced on-line terminal-based retrieval program originally developed by the Lockheed Corporation, cfr. Tapper 1973:204. RECON was created as a retrieval program for users requiring access to NASA's extensive technical library, cfr. Kondos 1971:148. RECON appears to be just one subsystem of the NASA system STIMS (Scientific and Technical Information Modular System). The File Maintenance Subsystem of STIMS is also included in JURIS, along with file maintainance routines from FLITE, and some special programs for interfacing the different subsystems, cfr. Kondos 1971:153.

A number of retrieval strategies are available on JURIS: The basic command is SELECT. Following the SELECT-command, the user may specify search terms (words or phrases). The JURIS system then retrieves the references to the documents satisfying this query, i.e. documents which contain at least one of the specified words or phrases. Through a new command, COMBINE, the user may combine the sets of documents, employing the standard Boolean operators and and or. Viewed as a whole, the SELECT and COMBINE commands represent conventional Boolean-based text retrieval strategy. In JURIS this is a two-step operation, while in most text retrieval systems search words are selected and combined in the same statement. Even though somewhat more cumbersome, the JURIS solution may have a certain pedagogic effect - making the user more aware of the nature of the retrieval strategy.

A number of additional features make the retrieval strategy more flexible. The data base may be segmented through the use of prefixes. The query

CD/AUTO

requires that to satisfy the query a document must belong to the subset of the data base containing United States Code. (Cfr. Kondos 1971:151-152 for a listing of prefixes.)

Through a LIMIT command, the user may restrict his search to documents from a certain period of time. This chronological segmentation of the data base may be conducted initially or as a final restriction after having retrieved a number of references.

Results from a query may be kept in the system through the KEEP

[Page 76 ]


command. This is a function of great value, as it makes it possible for the user to construct standard queries of a rather complex nature, employing. them as building-blocks in a number of queries.

JURIS has the possibility of having a synonym thesaurus, but this is not utilized at the moment. In order to remedy the difficulty posed by the specificity of the full text, the EXPAND command may be employed. Through the EXPAND command, part of the inverted file is displayed, showing which words in the data base (or in the segment of the data base specified) are alphabetically close to the search terms. The user may select synonyms from the displayed list, and expand the original query with these words. In JURIS, part of the West Topical Index is also contained, and the documents are indexed with West Key Numbers. Using the EXPAND command, the user may go into the index, explore the structure of the part of the index he is interested in, and select appropriate terms as search words.

The JURIS system also has a number of ways of presenting the retrieved documents, among them a KWIC format with highlighting of search words for fast relevance assessment. There are also a number of other commands not described here, for instance an EXPLAIN command for the inexperienced user. For a more detailed description of the JURIS search language, cfr. Basheer 1973:58-63 and Kondos 1973:11-1-13 with examples.

The JURIS system is in many ways an advanced system, and shares many characteristics with systems like IBM's STAIRS, Siemen's GO-LEM, or Industri-Matematik's IMDOC. It does, however, lack an adequate ranking function - but the usefulness of such a function may be questionable.

Tapper reports in 1974:10 that JURIS is under review, and Kondos states (1974:15) that a revised version is due to become operational in July 1974, providing even more sophisticated features. Changes will also be made in the communication aspect of the system, to allow access through 100 terminals throughout the United States. The lack of ranking functions will probably be remedied at the same time as the possibility of implementing a vector retrieval routine is being looked into.

No objective evaluation of JURIS is known, but Basheer (1973:56) reports that in one instance, eight "search and seizure "-problems were researched concurrently, one attorney using JURIS, the other conventional methods. The first finished his search in 28 1/3 minutes, the second in 2 hours 20 minutes. Also, Basheer mentions an example where one

[Page 77 ]


hour of attorney time using JURIS uncovered 100 additional statutory references after a thorough search by the Criminal Code Revision Unit. Tapper (1974:11) points out that these evaluations clearly are not satisfactory, and that the Department contemplates comparative tests, primarily against Mead Data Central's LEXIS.

[Page 78 ]


5 Europe

5.1.1 Introduction

Below, we are going to present a brief survey of European activity within the field of computer-based legal reference retrieval systems. Although the development in Europe has lagged 5-7 years behind that in the United States (cfr. the early survey of Steiner 1967), the activity has been considerable. Also, owing to our Norwegian perspective, it has been easier for us to discern this activity than that taking place in other parts of the world.

We do not want to make a comprehensive survey, but will try to highlight projects that we have found to be of interest. Also, we will in this section focus our attention not so much on the details of design of the retrieval systems as on points in their basic design, trying to relate the project of which a system usually is a part, to the national development.

5.1.2 European co-operation

(1) The Council of Europe

The Council of Europe has been the main forum for European experts within the field of legal informatics. In 1968, the Committee of Experts on the publication of national state practice in the field of public international law, recommended to the European Committee on Legal Co-operation (CJJ) that a committee of experts should be appointed to study the question of harmonization of technical means of programming international treaties into computers. The CJJ approved this recommendation, which was in turn approved by the meeting of the Ministers' Deputies. The new expert committee - which was given the name "Committee of experts on the harmonization of the means of programming legal data into computers" - held its first meeting in September 1969. The introductory memorandum prepared by the Directorate of

[Page 79 ]


Legal Affairs contains a survey of the state of the art at that time, cfr. Council of Europe EXP/Ord.Jur. (69) 1.

The history of this committee will reflect the history of computer-based legal information retrieval in Europe for the next few years. In the beginning, there were rather outspoken confrontations between the "indexing" and "full text" philosophies - something which diminished as the systems were developed into versions accepting both documents in full text and document surrogates composed of indexing terms. Nor would it be too much of an injustice to maintain that the harmonization problems occasioning the establishment of the committee were not of a critical nature. This is partly due to the fact that the systems created in the various countries were mainly dedicated to the documentation of national law, and that the validity of law is restricted by the jurisdiction of the state - there being rarely any need for exchanging information between national centers; and in those cases where the need arose, language problems and lack of familiarity with the legal system would make computer-based retrieval systems a less adequate means of acquiring that information.

As stated above, it was the problems related to treaties which were the cause of setting up the committee - and these are certainly relevant to more than one country. As far as we know, however, only one system has been created in Europe dedicated to handling treaty information. This is the Spanish system IBERTRAT, initiated in 1972, cfr. Etcheverria 1974.

The committee recommended certain harmonization measures in the field of legal data processing, which were adopted as resolution (73) 23 by the Committee of Ministers of the Council of Europe. This resolution sets out a minimal list of headings for machine-readable legal information, specifying such headings for four different types of legal sources.

The Committee also recommended that a new committee should be set up "to keep the state of research and development in the field of legal data processing in Europe under review". This is a recognition of the vital function of the former committee as a forum for discussion in a field where the speed of development is high, and where the sources of information are hard to come by and seldom up to date.

The new committee - named the "Committee on legal data processing in Europe" - held its first meeting in October 1974. In 1975 it arranged the Third Symposium on legal data processing in Europe in co-operation with the Norwegian Research Center for Computers and Law, Oslo.

[Page 80 ]


(2) INTERDOC

Almost simultaneously with the Council of Europe initiative, the organization INTERDOC was founded. This took place at a meeting in Brussels on 6 October 1969. INTERDOC became an international association, open to lawyers from all countries, with the purpose of promoting the use of information processing in the legal field. The organization took the initiative for the first international conference of legal informatics (together with its French counterpart, the ADIJ, cfr. below in section 5.3.1), and published the proceedings of this conference (cfr. INTERDOC/ADIJ 1973). INTERDOC also publishes (since 1971) an Interim Bulletin in French and English, later to be replaced by a journal.

5.1.3 CELEX1

1 We are indebted to Mr. Jochen Streil of the Court of Justice of the Communities for comments on this section

In 1967, a working party within the Legal Service of the Commission of the European Communities was set up to look into the possibility of creating a computer-based legal information service for Community law, cfr. Bauer-Bernet/Streil 1973:9. The system came into preliminary operation in 1970 (Tapper 1973:283), and since January 1971 it has been available to a limited number of users.

The system is formally known as CELEX, but has also been known as "The EEC System", CE-LEX, Database LEX of the "Communitas Europa" etc., cfr. Bauer-Bernet/Streil 1973:9. The system seems to have been confused with the Belgian CREDOC from time to time, cfr. Vallet (1968).

Community law has several features which generate problems in regard to an information system. It is a body of law valid in relation to several countries and represented in several equally valid languages. Some documents, like national case law relevant to community law, are, however, only readily available in one language. The "half-life" of a legal instrument within the community law is very short (cfr. Bauer- Bernet/Streil 1973:9); this is related to the fact that the level of generality varies very much: from constitutional amendments to subsidy rates for pig-feed (cfr. Tapper 1973:284). Community law may, in fact, be pointed out as an exponent of what has been called the "trivialisation of law" (Simitis 1974:5), an effect of the increasing governmental involvement in social affairs. The short "half-life "has its counterpart in a high production rate of new legal sources. And viewed as a system, the Community law -

[Page 81 ]


though young in years - does indeed indicate that it is prone to become the victim of an information crisis: a legal norm system of a detailed nature undergoing rapid change and being relevant for a great number of users in different countries.

The documents are characterized by a set of fixed fields, about 30 of which contain bibliographical information. Care has been taken to capture the complex citation structure of the material, cfr. Bauer-Bernet/Streil 1973:17-18, Streil 1973:29-30. In addition a variable field of text is registered - this may be the full text of the document (which is the case with basic instruments) or just a document surrogate composed of a few descriptors (which is the case with routine administrative instruments).

In 1975 it was reported that basic treaties, secondary Community leislation, supplementary legal acts, and complementary legal acts had been registered. From the beginning of 1976 all preparatory documents and parliamentary records forming part of the legislative process of the Community are being incorporated into the system. Registration of national measures to implement Community measures and relevant decision of national courts has commenced. Other documents are at present available on a card-index system, cfr. CELEX 1975.

As stated above, the system was developed by the Legal Service of the Commission, primarily for the use of the Commission itself. By now the system is interinstitutional and is shared between Parliament, Council, Commission, Court of Justice, and the Economic and Social Committee. The Commission is located in Brussels, while the computer installation of the Community is in Luxembourg. CELEX is therefore dependent upon teleprocessing. In the years 1969-1973 CELEX was built around the IBM system DPS (which has also been used by the French center CEDIJ). This system is batch-oriented, and up to eight queries were run in two daily batches (cfr. Tapper 1973:285). In 1973, however, CELEX switched to the more advanced IBM system, STAIRS, which is terminal-oriented, and made the data base available on-line. (Cfr. below in section 7.2 for a somewhat more detailed description of STAIRS.)

CELEX is envisaged as developing into a Community-wide service - there are a number of users outside the Commission who have exerted a steady pressure to have the system extended. A project to look into the feasibility of such a more comprehensive system was launched in 1975, and the optimistic initial report indicates that CELEX may become available throughout the Communities by 1981, cfr. Brakenfield/Price 1975. The first two outside users have been accepted for a trial period.

[Page 82 ]


5.2 BELGIUM2

2 We are indebted to the President of CREDOC, M. Edouard Houtart, for comments on this section.

Compared to the United States, Belgium is a small country, but nevertheless one that reacted rather quickly to the legal information crisis. As Tapper pointed out (1973:188-189), a small country is also a small market - and the indexes, digests, and other manual research aids abundant in the United States are lacking in most smaller countries.

Impressed by the advantages offered by operative systems in the United States, l'Assemblée des Batonniers de Belgique (the Union of Belgian Lawyers) and la Fédération des Notaires (the Federation of Notaries) set up a working group in 1966 (cfr. Houtart 1969:16, 1973: 21). The report of this group formed the basis for the creation of CREDOC (Centre de documentation juridique), and this center became operational in September 1969 - offering its services to all Belgian lawyers and in all fields of law. The original programs were developed by the Paris firm. Information rationelle,3 but were later radically redesigned by CREDOC's own staff (Prestel 1971:66).

3 This firm also had a hand in the development of retrieval programs for Parisian lawyers (Barreau de Paris), demonstrated in the fall of 1968, cfr. Kornerup 1969:6.

In 1973 CREDOC employed 12 staff members: 7 doctors of law, 3 secretaries, 2 programmers, and 1 operator.

The computer-based system of CREDOC is an indexing system. The documents consist of one article in a statute, one decision, one article of legal literature etc. These documents are described by way of an indexing language.

The main component in this language is the descriptors (descripteurs). These are contained in a thesaurus, reported to consist of 7 000 entries (Wallemacq 1974:2). In addition, the documents may be characterized by modifiers - ante-descriptors (facettes), or post-descriptors (spécificateurs). (The number of facettes are said by Wallemacq 1974:2 to be 65, and the number of specificateurs 740, while the corresponding figures of Houtart 1973:3 are 150 and 550.) These indexing terms combined give the possibility of creating a great number of concepts (Houtart 1973:3 states the number at that time to be 43 000 concepts, the corresponding number given by Wallemacq 1974:2 is 55 000).

The thesaurus lists the indexing terms in both French and Dutch, the two official languages of Belgium. To some extent, German terms are employed as well. The terms are expressed as a four-digit code, which is

[Page 83 ]


identical in all languages. This code makes the indexing independent of the original language of the document, something very important in the bi-lingual society of Belgium.

In addition to the listing of indexing terms, the CREDOC staff has also developed hierarchical structures (rapports paradigmatiques) showing the relationship between indexing terms, cfr. Aars 1971:9. These structures are used to broaden or narrow queries. It is quite time-consuming to create these structures; Prestel (1971:61) reports that about half the time of the CREDOC indexers is used to elaborate indexing structures.

The indexing of documents is conducted by lawyers on the CREDOC staff. Originally the analysis of documents was limited to title and abstract, but now encompasses the whole document, as it was found that title and abstract alone did not always give an adequate understanding of the document (cfr. Prestel 1971:58). An example of a filled-in analysis form is given by Prestel 1971:59, and the form is discussed by Wallemacq 1972.

CREDOC aims at covering all of Belgian law. Wallemacq reports (1974:3) that the data base at that time consisted of approximately 85 000 documents, comprising in particular all the jurisprudence and articles published in 37 legal journals since January 1968.

The questions are put to CREDOC in much the same way as a document is captured. The question is not posed by the user direct to the computer, but forwarded to a CREDOC staff member. The question is transformed into a query constructed of the appropriate indexing terms, combined by Boolean operators. As a further sophistication, there exists a retrieval strategy called pondération, implying that indexing terms are eliminated one by one through a predecided sequence, giving responses of ascending generality - a sort of simple ranking function.

Examples of questions are given by Prestel 1971:63-64, Houtart 1973:8-11 and Aars 1971:19-22.

The response received by the user is a list of references, accompanied by a short commentary indicating the probable solution to the original question. CREDOC does not normally supply the full text of the document (which, of course, is not stored in machine-readable form), but will on request supply photocopies as an additional service. If the user so desires, CREDOC will also undertake to translate the documents (cfr. Houtart 1973:4).

In addition to the main service of computer-assisted legal research, CREDOC also supplies the legal profession with other services in which the computer can be of use. For instance,

[Page 84 ]


CREDOC runs a selective distribution of information service in conjunction with the updating of the data base. For a list of additional services, see Wallemacq 1974:4.

The questions are free of charge, but all members of the Union of Lawyers and the Federation of Notaires have to pay a yearly subscription fee. In 1969 a monthly average of 165 questions were processed, of these 60 per cent were posed by lawyers, 20 per cent by notaries, 13 per cent by judges, and 7 per cent by universities, cfr. Prestel 1971:67.

The question is processed off-line in batch runs. Response time - the time elapsing from when the user poses his question till when he receives his answer - was 4-12 days in 1970, but has since then been reduced by new equipment. In January 1971, CREDOC purchased its own Honeywell Bull 115 General Electric computer, much the same as the computer used earlier on a service bureau contract. Since then, the system was run once or twice a day (in contrast to twice a week previously) and response time is reported to have dropped to about one day. Conventional teletype facilities were also available at this time. Since January 1974 the system has been operational on an IBM 370. It is stated that conversion to an on-line system is feasible, but one seems inclined to doubt that sufficient user need exists for this to be economical (Wallemacq 1974:3).

CREDOC is one of the most widely discussed systems in Europe, and consequently great interest has been attached to its performance. In the first period of operation, the questions of 50 selected users were analyzed. Of the 347 answers put to CREDOC by this group in April 1969, it was found that for 187 questions (54 per cent) the responses were adequate, for 55 questions (16 per cent) the avnswers were inadequate, while for 105 questions (30 per cent) the data base at this time had insufficient coverage, cfr. Le CREDOC 1969:28. (At this time CREDOC supplied additional conventional research to compensate for the coverage failure, cfr. Wallemacq 1974:3.) The main cause of inadequate responses was lack of appropriate indexing in the initial period, cfr. Prestel 1971:65. A later test (December 1970) gave the more satisfactory result of 85 per cent adequate responses, cfr. Aars 1971:10.

A very interesting experiment was the comparative test conducted by Prestel (1971b), in which a set often identical questions were posed to CREDOC and to the Swiss full-text system CONTEX (described below at section 7.6). In this test, the differences in the two approaches have been demonstrated. The test seems to tip the scales in favor of the CONTEX system, cfr. for instance Prestel 1971b:54, where it appears that of a total of ten questions there were retrieved 18 important

[Page 85 ]


(wichtige) documents by CREDOC and 42 by CONTEX; 2 interesting (lesenwerte) documents by CREDOC, and 32 by CONTEX; and 2 irrelevant documents by CREDOC, and 10 by CONTEX. It ought to be stressed, however, that the test results may be explained by other factors than just the difference between indexing and full text systems.

An appraisal of the CREDOC system cannot, however, only take into account the kind of efficiency tests quoted above. When the Belgian lawyer set out to solve his research problems, a number of facts had to be taken into consideration. (A discussion may be found in Le CREDOC 1969.) Firstly, the system had to be completely bi-lingual (French and Dutch), something which hardly favored a full-text solution, cfr. though the DATUM solution, below at section 6.2. (An interesting discussion on the problems and possibilities posed by the bi-lingual approach is provided by Houtart 1972.) Secondly, the computer available was rather small, making the prospect of storing great volumes of text less tempting. And finally, there was the question of cost - one felt that both input and processing would be less expensive if just a concentrate of the full document was to be used. CREDOC was at the start fully financed by the legal profession itself, though the Ministry of Justice in 1969 started to provide modest financial support.

CREDOC is still to some extent branded and buoyed by its origin; a tool created by the legal profession to satisfy its own needs, and consequently as a rather down-to-earth system which is practical and which has been operational for a long time.

5.3 FRANCE4

4 We are indebted to Louis Poli, Directeur du Centre d'informatique jurdique, for comments on this section.

5.3.1 Introduction

French lawyers are renowned for their systematic approach to legal problems, consequently there is no surprise in finding one of the international pioneers in the French lawyer Lucien Mehl. He was one of the first to recognize the possibilities the computer offered lawyers (cfr. Tapper 1973:196), and discussed these possibilities in a number of papers published in the 1950s. His early papers, however, focus on deontic systems rather than information retrieval systems, in much the same way as the tendency in the United States at the same time. But Lucien Mehl

[Page 86 ]


also became the founder of one of the important French legal retrieval centers, CEDIJ - cfr. below.

Though Lucien Mehl probably was the first, he was by no means the only French lawyer to become fascinated by the computer at a rather early date. In fact there seems to be a profusion of projects in legal informatics during the 1960s; Kornerup (1969:2) reports that by the end of the sixties projects had been initiated in 16 different institutions. This is also a parallel to the development in the United States at the same time.

But the French did not duplicate the American projects or solutions. There was a tendency in the United States to opt for full-text systems, while in France there has been a tendency to favor the indexing solution. A possible explanation of this diverging development is offered by Kornerup (1969:2), who points out that the system of legal education in France makes rather a great number of well-qualified lawyers available at university institutions, while the university computer installations tended to be of a rather modest size. In the period when most projects were initiated, it was easier to have material indexed than to have available a computer adequate for the processing of the rather large volumes of data required by a full-text solution.

It was felt that the great number of rather independent projects were apt to divide the resources within the field of legal informatics - to a greater extent than was desirable, some thought. In order to achieve a better co-ordination between the different projects, the organization ADIJ (Association Francaise pour le Développment de l'Informatique Juridique) was founded in 1970.

We will not in this section try to give a bird's eye view of the French scene, but satisfy ourselves with highlighting three major projects; CE-DIJ, CRIDON, and IRETIJ.

5.3.2 CEDIJ

In 1960, Lucien Mehl published an article advocating a national center for legal information services ("Les sciences juridiques devant l'automation", Cybernetica 1960:22-40, 142-170). Six years later he got the opportunity of attempting the realization of this plan when Conseil d'Etat (where Mehl is Conseiller d'Etat) allowed him the assistance of Jean-Marie Breton (then working with the Service Central Organization et Méthodes) for one year. During this year, a survey of existing systems was completed, and a plan for the future development was suggested. In January 1967 a working group was established, financed by the Ministry

[Page 87 ]


of Justice (Centre de Coordination des Recherches). This group produced a dictionary of words based on the vocabulary of Code Général des Impôts and the decisions by Conseil d'Etat in tax law cases in the period 1935-1954. (Cfr. Breton/Mehl 1972:130.)

In May 1968 the working group started trials with the IBM program DPS (Document Processing System) on the computer installation of Institut National de la Statistique et des Etudes Economiques (INSEE), and in July of the same year the first statutes were converted to machinereadable form.

In 1969 the working party took the name CEDIJ (Centre de Recherche et Développement en Informatique Juridique). Trials in legal information retrieval with the DPS continued, and the CEDIJ began developing its own program, to become known as DOCILIS (Documents, Interrogations Libres).

In April 1970 CEDIJ ceased to be part of the structure of the Ministry of Justice, and was organized as a non-profit organization. At the same time it became operational as a documentation and research center, devoting its efforts to theoretical and applied research (Breton/Mehl 1972:130). The organization has members from the Conseil d'Etat, Cour de Cassation, Cour des Comptes, and the Ministries of Finance, Internal Affairs, and the Parliament. In 1972 the Center had a staff of 17: 1 director, 2 legal assistants, 4 lawyers, 6 programmers of varying competence, and 4 punch-operators (Breton/Rave 1972:57). In 1971 the budget was approximately 700 000 francs, mainly granted by the Ministry of Justice. In 1976, CEDIJ established co-operation with the Canadian system DATUM, which included a satellite link between the two centers.

CEDIJ includes in its data base legislation and case law. A document - which is defined as one or more paragraphs from a statute or decision (i.e. an article of a statute) - may be represented in full text, or as a surrogate composed of an abstract or indexing terms. In the case of statutes and regulatory law, the full text of the documents is registered, in the case of decisions, an abstract is registered, reducing the full text, on an average, to one-tenth (cfr. Breton/Rave 1972:33). Emphasis has been put on bibliographic data - 53 different headings are attached to each document, cfr. Tapper 1973:197. Examples of structuring of texts and bibliographical references are given by Breton 1969. In 1972 the data base comprised some 50 million characters, and by 1975 the data base was expected to include 250 000 decisions, cfr. Tapper 1973:197. An inventory of the data base may be found in Breton/Rave 1972:34.

[Page 88 ]


Since 1971 the decisions of Conseil d'Etat and Cour de Cassation have been processed concurrently, though there has been a time-lag of several weeks due to the time-consuming registration of the decisions. CEDIJ has since 1972 moved into the field of regional and, in 1976, social security law.

For the retrieval, CEDIJ used the IBM program DPS to begin with. In its original version DPS was not terminal-oriented, but a supporting program - SPHYNX - developed by the INSEE for Observatoires Economiques Regionaux, made on-line dialogue possible. Since 1974 CEDIJ has switched to another IBM program, STAIRS, described below in section 7.2. This is an advanced text retrieval program with possibilities of Boolean logic and ranking.

Of special interest are the supporting programs constructed by CEDIJ in order to improve the performance. These programs - known as DOCILIS - are of general interest over and above the part they play in the CEDIJ system.

A number of routines have been developed which aim at improving the recall capabilities of the system.

As mentioned above, the working group preceding the CEDIJ produced a dictionary. This had been elaborated and included in DOCILIS, and comprises two main sections: one for legal terms, another for nonlegal terms. The non-legal section is further divided into a general and a specific subsection. And these two subsections, as well as the legal section, are further divided into four categories, three of which correspond to functional categories (entités, biens, évènements) and one rest category (synthétique).

Words in the basic categories are hierarchically organized; words with several meanings may consequently be found in different hierarchical structures. The paradigmatic relations between words are registered and entered into the system, giving a listing of grammatical variations, syntactical substitutions, and quasi-synonyms. These subsitutions are made automatically by the system, using the terms of the query as input. In 1975 the list of grammatical variations had 11 000 entries, the list of syntactical substitutes had 6 500 entries, and the list of quasi-synonyms had 4 500 entries. The list of grammatical variations is incomplete, as it only includes terms with different meanings. Otherwise grammatical variations are generated by a special support program, GRAMMATIC.

CEDIJ has also tried to tackle the recall failure, which may orginate in the fact that the full text implies words or concepts, rather than

[Page 89 ]


representing them explicitly. This is done by manually inserting the implied terms in the documents between asterisks. Also, words not included in the document, but used as indexing terms to the document in manually prepared indexes (as may be found in a case reporter, for instance) are inserted in the same way. One has further found that the table of contents in a book may include terms of higher generality than the document itself, which consequently is of some use when it comes to reducing the specificity of the document.

Another set of routines aims at reducing precision failure. One strategy has been to make polysemes and homographs explicit. The problem of homographs is somewhat magnified because of the lack of distinction in the computer character representation of accents, i.e. the word "greve" will represent both "grève" and "grevé", "du" both "du" and "dû" etc; cfr. Kornerup 1969:5, Berger 1972:90. A term making the actual interpretation of the polyseme or homograph explicit is manually inserted in the document between asterisks.

French is a language which forms compound words in the same way as English, by joining a string of words. It was found that such phrases might reduce precision because they were difficult to single out - for instance the expression "chemin de fer" would reduce precision when searching for either the term "chemin" or "fer". This has to some degree been resolved by reducing the phrase to a synthetic word, for instance "chemin-de-fer". Care has to be taken, however, in order not to lose information (and consequently reduce the recall capabilities of the system) through exaggerated use of synthetic words. The system also allows distance operators used as a restriction in a query containing two or more words.

A question put to CEDIJ, is formulated by the user in natural language and transformed to a query in the format accepted by DPS by one of the CEDIJ staff. This has probably changed, since the formal requirements of STAIRS are less strict than those of DPS, at any rate reducing the 20 minutes previously required for query construction. The answer is given in a number of alternative formats, including hard copy of the retrieved documents as they are contained in the system. It is reported (Breton/Rave 1972:55) that CEDIJ in 1971 processed about 100 questions a month, in 1976 this figure was up to ca. 150 questions a month.

Several tests of the retrieval capabilities of the CEDIJ system are known. In an early phase of the system (1970), recall was less than 50 per cent, but during 1971 - with the support of the DOCILIS programs - this

[Page 90 ]


was improved to 85 per cent. With respect to Conseil d'Etat the dictionary was not at that time satisfactory, and a sample taken 1970/71 of 42 questions gave a recall ratio of 42 per cent. In the same period 57 questions directed at the documents composed of Code des Impôts gave the recall ratio of 84 per cent. It was expected to reach a recall ratio of 93 per cent when the DOCILIS routines were perfected (Breton/Rave 1972:56).

The reduction of recall failure has been given priority over the reduction of precision failure. In 1970 precision was estimated to be less than 40 per cent, while it is expected that the supporting program of DOCILIS will bring precision up to 80 per cent.

5.3.3 IRETIJ

Though CEDIJ was initiated by one of the pioneers of legal informatics, it was not the first documentation center to be created in France. In 1965, at the University of Montpellier, Professor Catala from the law faculty and Professor Falgueirettes from the science faculty set up a working group in order to develop a legal information system. In 1967 the work was organized in Centre d'Etudes pour le Traitement de l'Information Juridique (CETIJ), which in 1968 became part of the national research structure of Centre National de la Recherche Scientifique (CNRS). The same year CETIJ was transformed into a university institute, Institut de Recherces et d'Etudes pour le Traitement de l'Information Juridique (IRETIJ).

Professor Catala, although now living in Paris, is still the director of IRETIJ, while in Montepellier there is a staff of 26 persons (16 on part-time basis). The manager in Montepellier, Dr. Michel Bibent, has written his doctoral thesis on the system: L'informatique applique a la jurisprudence (Montpellier 1972).

The IRETIJ activities have from the start been financed through three main sources: the university itself, CNRS, and research contracts.

Several aspects of the work carried out by IRETIJ are of interest. In our opinion, the choice of data base and the perspective in which IRETIJ placed its information service are not the least interesting. In France there are 27 cours d'appel, which yearly decide approximately 55 000 civil and 33 000 criminal cases. Until 1940 case reports were published for most of these courts, but today there exist only two series of case reports, one for the court of appeal in Paris and one for that in Grenoble.

The lack of a systematic publishing service has been felt, not least by

[Page 91 ]


the universities, cfr. Bibent 1968:665. Consequently several universities have created their own special documentation service in order to remedy this fault in the legal information system. This has been done by the universities in Aix-en-Provence (Professor Bertrand), Grenoble (Professor Giverdon), and Lyon (Professor Vincent).

The work of IRETIJ more or less took the descisions of the appeal court as its starting point, though it set out by documenting the decisions of the Cour de Cassation (which is also taken care of by CEDIJ, cfr. above). By the end of 1971 the system of IRETIJ included some 47 000 documents, and a yearly growth of 11 000 documents is expected (cfr. Berger 1972:67-68).

The programs have been developed by IRETIJ (and are to a great extent written by Professor Filiatre of the Montpellier University Computing Center, who also wrote his doctoral thesis on this project: Conception et réalisation d'un système de documentation automatique pour la jurisprudence, Montpellier 1970).

The programs are based on an indexing philosophy, cfr. Bibent 1968:666-667, which reasons that full text Will make interrogation techniques complex and problematic. The documents are not represented in full text, but rather as document surrogates composed of the keywords assigned to the decisions on publication. A short description of the original version and method used by the IRETIJ is given by Löhr 1969.

The decisions are also published with a short abstract. A comparative performance test in 1968-1969 gave - on the basis of 38 questions - a slightly better result for the abstracts than for the assigned keywords. IRETIJ did not on that occasion test the combination of keywords arid abstracts, which may appear surprising, since these two often supplement each other, the keywords being of a general nature, the abstracts of a rather specific nature (cfr. Berger 1972:75). But by utilizing additional keywords from the references made from the document in question, the recall ratio was brought up from 36 per cent to 51 per cent.

IRETIJ was not satisfied with the recall capabilities of its system, cfr. also the test referred to by Kornerup 1969: 5-6. Analyzing the causes of performance failure, IRETIJ found that this was mainly due to inexhaustive indexing, and partly due to lack of uniform terminology and structure. In order to overcome these obstacles to better performance, IRETIJ developed their own guidelines to the indexing of legal documents. These guidelines were published 1969-1970. Using them, IRETIJ prepared their own document surrogates of 99 decisions of the Cour de Cassation. In a

[Page 92 ]


comparative performance test, this gave a recall ratio of 63 per cent in contrast to the 51 per cent obtained by using the keywords assigned on publication.

In evaluating the effort and the emphasis IRETIJ has put on developing adequate indexing methods, one may become critical, cfr. Berger 1972:79-80, Haft 1970:146. But pragmatic factors may have made it difficult to choose another way of improving the retrieval capabilities of the system, and the work on indexing problems may be of value in itself.

When documents are read into the system, the actual words are replaced by a five digit code (numéros de notion). This is done automatically. The dictionary assigns the same code to all different grammatical variations of the word, and also to synonyms. One also operates with what is known as a general term (notion générale), which covers words sharing a word root, like for instance "commerce" (with grammatic derivations), "commereant", "commercial", "commercialite", cfr. Berger 1972:86-87. The codes of such clusters have identical first four digits, while the last digit varies.

In 1972 the dictionary contained 18 000 different word forms. There were 5 000 general terms, and counting the different terms in such clusters as well, there were altogether 8 000 different codes.

The codes are loaded into an inverted file, and the queries are matched against this file. The user does not have to use the codes, but can use words which - through the dictionary - are transformed into the appropriate codes. (The system had no terminal facilities in 1972, but the operator's console might be used.) Terms in the query might be combined by Boolean operators, and also operators indicating that, as a further condition, the words should occur in the same sentence, paragraph, or document. Having received a response from such a search, the user may move to the text file and specify further restrictions, utilizing information not available through the inverted file - as for instance the year of the decision.

In order to promote better performance, a thesaurus is planned, which will replace the existing tools for preparing keyword document surrogates. As from 1974, IRETIJ's system is known as JURIDOC.

5.3.4 CRIDON de Lyon

We may recall that in Belgium the notaries were one of the moving forces behind the CREDOC system. In France too, the notaries have an

[Page 93 ]


important place in the legal information structure. This became even more apparent when a reform in 1959 resulted in the removal of civil servants from the districts, leaving the notary as the most approachable lawyer. This magnified the problems already created by deficiencies in the legal information system, and the notaries set up - on a co-operative basis - what were to be called Centres de Recherces d'Information et de Documentation Notaires (CRIDON), cfr. Haft 1970:128, Chamoux 1969:4-5.

The CRIDON de Lyon started to use a mechanical system in order to facilitate legal research. The step from a mechanical to a computer-based system is in principle rather short, and in 1969 CRIDON de Lyon - in co-operation with mathematicians and linguists from Grenoble University - started to develop a legal information system, Système de Documentation Notariale Informatisée (SYDONI). The program was revised in 1972, and is now being developed into an advanced version (SYDONI EURINFOR, running on an IRIS 60).

The starting point of this system was a list of descriptors that were to be used in characterizing the documents. Instead of allocating a numerical code to each indexing term - like the solution adopted by both CREDOC and IRETIJ5 - CRIDON de Lyon rather compressed the indexing term to an unambiguous alphabetical form. Thus "abadon" became "aba", and "abattement" became "...tt". To each indexing term was attached a descriptive field (zone descriptive) in which the relationships of the term with other terms were specified: synonymity, hierarchical relationship, etc. This field also contained the addresses to the documents in which the term occurred.

5 There has been co-operation between IRETIJ and CRIDON de Lyon, cfr. L&CT 12/1968:12.

The documents were transformed from the earlier mechanical system to the computer-based system. Through the work with this conversion, a certain conceptualization took place, creating logical documentation units (unites logiques de documentation) which constitute the metalanguage of the SYDONI. Through this the search may be limited to documents of a certain category, or special subjects - cfr. Haft 1970:131.

The system is terminal-oriented, and the user retrieves the documents through a dialogue. He presents his question in natural language and receives from the system a list of descriptors, which he may then combine with Boolean operators. Through the information in the descriptive fields, homonyms, synonyms, etc. may be employed or discarded.

[Page 94 ]


5.3.5 DARIUS

At the end of this brief description of some selected French projects, we should like to draw attention to the interesting hybrid system DARIUS, offering a unique combination of computer and microfiche techniques, cfr. Buffelan 1972. The system is in operation at the Institut de Recherche d'Informatique Juridique at the University of Paris-Sud (directed by Professor Jean-Paul Buffelan). The system is terminal-oriented; the user conducts his search on a systematic index or an index of keywords. These indexes are projected on to the terminal screen, and the appropriate terms are selected by the use of a light-pen. In the same way, terms may be combined with the help of Boolean operators.

The documents specified through the query constructed in this way are then retrieved from a collection of microfiches, and the document is projected on the screen.

The system has been developed by Pierre Riviere, a civil engineer concerned with the retrieval of maps and images. It is obvious that this intermarriage of computer and microfiche techniques is of great interest, especially in cases where the original image may not easily be represented by a computer-generated image on a video display unit - as for instance is the case with maps.

5.4 ITALY6

6 We are indebted to Dr. Constantino Ciampi of Institute per la documentazione giuridica del consiglo nazionale delle Richerche, Florence, for comments on this section.

5.4.1 Introduction

There is an impressive activity going on within the field of legal informatics in Italy. The interest in this field dates back to 1962, when the first articles appeared (cfr. Frosini 1972), and the Center for Automatic Documentation was founded in Milan (cfr. Ciampi 1974:693). This center has been mainly dedicated to solving documentation problems in the legal field, owing to the strong representation of lawyers on its board and staff. Among the projects undertaken by this center, one should mention the creation of a pilot model for computer-based retrieval with a data base of fiscal law, efficiency tests for classification systems, and experiments with a system of automatic classification of legal texts (the OROI project), cfr. Ciampi 1974:695 with further references.

As early as 1963, the Corte di cassazione began to take an interest in computer-based retrieval systems - and the result of this interest, the Italgiure system, will be described in some detail below.

[Page 95 ]


At the end of the 1960s, the Centro de Giuscibernetica (at the University of Turin) and the Sezione di documentazione automatica (now Reparto d'informatica) were established under the directorship of M.G. Losano. The Institute per la documentazione giuridica was established in Florence as part of the structure created by the national research council (Consiglio nazionale delle richerche), under the directorship of Constantino Ciampi. These centers also published two journals within the field of legal informatics - Systema published in Turin and Informatica e diritto published in Florence.

Informatica e diritto has succeeded the first journal of the Florence institute, Bollettino bibliografico d'informatica generale e applicata al diritto, which was published 1972-1973. Informatica e diritto is published quarterly; two issues are devoted to articles, two to the international bibliography on computers and law.

The Institute per la documentazione giuridica has also developed a computer-based retrieval system based on a full-text philosophy, with approximately 10 000 bibliographic abstracts as data base. (Banca dei dati bibliografici), cfr. Ciampi 1974:732. The system is terminal-oriented, and retrieval strategies are based on combining words by Boolean operators. The abstracts also include UDC numbers, which may be used as an argument in the query.

Among the more recent projects, one ought to mention the Camera '72 project carried out by a project group (CEAD: Comitato per l'Elaborazione Automatica dei Dati) appointed by the Camera dei deputati. This project looks into the possibility of a computer-based retrieval system for statutes passed after 1848, and seems to have opted for a full-text approach (cfr. Ciampi 1974:727).

We see that the three main projects in Italy also cover the three broad categories of legal sources: The Italgiure system is concerned with the retrieval of case law, the Camera '72 project is concerned with the retrieval of statutory material, and the system run by the Florence institute is concerned with the retrieval of legal literature. (There seems to be a tendency to see the Italgiure system as the nucleus of a total legal information system, a view that has brought forth critical remarks from Ciampi 1974:707.) Below we will briefly take a somewhat closer look at the Italgiure project, which has attracted international interest because of its original approach to the retrieval problem.

[Page 96 ]


5.4.2 The Italgiure System

The Italgiure project originated within the Italian Corte di cassazione, which each year issues about 12 000 decisions. These decisions have since 1924 been processed by the Ufficio del Massimario e del Ruolo in order to extract from the decisions a "massime "of the legal principles expressed in the decisions. Cfr. Schlagböhmer 1975:63-66 on the functions of this office. The aim was to satisfy the information need of the judges themselves and guarantee consistency in the decisions, cfr. Ciampi 1974:706. The "massime" averages about 1 200 characters with a title and a classification number, replacing the ratio decidendi of the decision, which averages some 15 pages (Borruso 1973:32 ).

Before making a decision, a judge must consult the accumulated compilation of "massimes". It is easy to imagine that this became a slow and laborious task as the years passed - and that the volume of "massimes" offered itself as a tempting data base for a computer-based system.

The critical comments by Ciampi (1974:711-713, 721-725) are of interest when it comes to establishing a perspective on the Italgiure system. He points out that in publishing the decisions, the criteria for selecting decisions are obscure, and that the full text of the decision is often replaced by a "massime" or an extract. This may in principle have the result (1) that the published body of decisions is not representative of the total volume of decisions; and (2) that the published "massime", or extract of a discussion, does not adequately reflect the content of the decision. In this perspective, the Italgiure system should rectify at least the first of these potential faults in the information system. The second would appear to be corrected if the author of the decision himself prepared the "massime", as is the rule for instance in the RIRA system. The Corte di cassazione did in fact try out this method in the years 1942-1949, but without success, as the "massime" produced was too specific in respect of facts and too abstract in respect of law, cfr. Ciampi 1974:722. Ciampi (1974:723) also points out that the Italgiure system has been developed as a retrieval system for "massimes", and that consequently basic questions like "how is jurisprudence best documented" have not been posed.

As mentioned above, the work on the Italgiure system started in 1963- 1964 under the direction of Dr. R. Borruso. It was designed as an indexing system, but used a rather original approach to the indexing of the documents. (As Borruso 1973:40 points out, the selected method is in some respects very different from a conventional indexing scheme.) A thesaurus consisting of - in principle - all Italian words was constructed. These words were reduced to a small number of basic forms, so-called "seeds of language "("semi del linguaggio"), cfr. Borruso 1973:35-36 for the characteristics of a "seed". The number of entries in the thesaurus was

[Page 97 ]


to begin with (1968) approximately 20 000, increasing to approximately 40 000 after two years of operation (1970). It is, however, estimated that the number of entries will not exceed 50 000, cfr. Tapper 1973:195. The terms included in the thesaurus are reduced to "seeds of language" or combinations of such seeds, of which there are approximately 3 000 (Borruso 1974:9). For instance the thesaurus entry "incendiary" will be represented by the "seed combination" "fire and (destruction or diffusion)", cfr. Borruso 1974:10, 1973:34-36.

Through the transformation of words by the thesaurus into "seeds", the problems of synonymity, homonymity, and hierarchical relationship are also resolved. The "seeds" are organized hierarchically; a general word will also imply all specific words, for instance the use of the word "animal" will include the word "dog", while the use of the word "dog" will not include the word "animal".

The documents are organized into four libraries, one for constitutional law, one for civil law, one for penal law, and one for bibliographic notes. In 1974, the system comprised approximately 133 000 documents. (For a description of the four document formats, see Borruso 1974:4, cfr. also Schlagböhmer 1975:82-83.) This organization of the data base was revised in 1975-76.

The system is terminal-oriented and of a conversional type. The user selects his search words without any restrictions, and the words are automatically translated into "seeds" or combinations of "seeds" (as will be the case where the search word is a homonym), or he may modify the logic of the combination of "seeds" used to represent the word. The user may also choose to skip this phase and trust the thesaurus to make adequate transformations of his search words. Also, in some cases the search word is not (as yet) part of the thesaurus, in which case the user has to select an alternative synonym.

The user may also bypass the transformation to "seeds", which will cause the system to retrieve only documents containing words identical with the search term. There also exists information on about 1 000 expressions composed of two adjacent words with a special legal use - for instance "purchase tax". These expressions, "diabolical combinations" as Borruso (1973:36) calls them, may be used as single search terms.

The retrieved documents are displayed on the terminal, preceded by various feedback information. The Italgiure system also incorporates a number of other refinements - for instance the possibility of displaying how the retrieved documents are distributed on different areas of law

[Page 98 ]


(spectral analysis), masking of characters, etc. No performance tests (estimating recall and precision ratios) have been carried out, cfr. Borruso 1973:41.

The system has been in various stages of operation over a period of years. In 1974, Ciampi reports that the system operates between 9 a.m. and 7 p.m. and is available at 10 terminals in the Palace of Justice and at terminals located at all the appeal and tribunal courts (1974:700). The computer installation is a UNIVAC 1106, and it is reported by the manufacturer that in 1974 more than 80 terminals were connected to the Italgiure system all over Italy. Cfr. Schlagböhmer 1975:66-67 for a description of the terminal network. There is therefore reason to believe that this terminal network is at present the biggest dedicated to legal information retrieval in Europe.

5.5 SWEDEN7

7 We are indebted to the Head of the Planning and Budget Secretariat at the Ministry of Justice, Mr. Börje Alpsten, for comments on this section.

In 1973 Sweden passed the first national Data Protection Act, a piece of legislation that has become renowned as a pioneer effort in the field of privacy. To some extent, the fame of the Data Protection Act has eclipsed the systematic work done in the fields of legal informatics. Sweden was probably the first country to adopt a plan for the development of a national legal information system. The initiative was not taken by a private organization or a single body within the public administration, but by the Ministry of Justice, which in 1966 made a survey of the possibilities offered by computer technology. This survey covered the whole field of authority of the Ministry of Justice, including the police, public prosecutors, prison adminstration, courts, and the Ministry itself. The results were outlined in a brief of 1 December 1966, and at the end of the same month (30 December 1966) a national co-ordination council was established (Samarbetsorganet for ADB inom rättsväsendet), cfr. Freese/Skoog/Tael 1972:11.

This council has co-ordinated the development of computer-based systems within the area indicated above. A number of projects were initiated within the police and prison administration - projects which through the co-ordination of the council emerged as pieces of a bigger puzzle, cfr. Trotzig 1971. The emerging national justice system was named Rättsväsendets Informationssystem (RI), and was structured in a

[Page 99 ]


number of stages. One of these was the adoption of a computer-based retrieval system for statutes and case law, and below we shall discuss this stage in further detail.

RI is in fact founded on regulatory law stating the responsibility of the administration in regard to the information system and the co-ordination council (Kungl. Maj:ts kungörelse 1970:517 with amendments of 1971:825).

Because of the structure underlying the Swedish RI system, the emphasis was from the very start on the user and the administration of justice. Consequently the RI system has adopted a refreshingly practical view of the possibilities of legal information retrieval, wherein not only the performance of the system in terms of retrieval efficiency has been dominant, but the feasibility of the system in economic terms and administrative effects has also been stressed. In this way, the RI system has become a down-to-earth system, without the extravagancies of spectacular retrieval strategies, but with solid practical results. This is partly due to the influence of the head of the Planning and Budget Secretariat within the Ministry of Justice, Mr. Börje Alpsten, and his former superior (now Chancellor of Justice), Mr. Ingvar Gullnäs.

In the early years of RI, tests were made in information retrieval using the IBM system DPS (which has also been used by CEDIJ in France and CELEX). This was not found satisfactory, mainly because of the lack of on-line dialogue functions (which were handled by an auxiliary program in the system of CEDIJ). A switch was made to the system IMDOC; which has been developed by the firm Industri-Matematik - and which in its advanced version is a very satisfactory retrieval system, with high efficiency measured in storage space and updating capabilities (cfr. Alpsten 1975, Leimdörfer 1973, and below in section 7.3, where IMDOC is described in further detail).

The text retrieval system IMDOC is in fact being utilized by several bodies within the Swedish public administration today; in 1976 there were 20 terminals connected to the system. Within the Ministry of Justice it is used for retrieval of statute law (in 1976 there were 400 statutes or regulatory laws in the document collection), case law, and treaties. It is also used for retrieving reports from public commissions, and for searching the register of all statutes and regulatory law in force in Sweden (Svensk Författningssamling, SFS) cfr. Samarbetsorganet 1972: 11. This register, which in 1976 comprised 4 400 entries, is kept updated, and to our knowledge it is rather unique; most countries, including - for the time

[Page 100 ]


being - Norway (though this situation will probably soon come to an end), do not have a complete and up-to-date register of all statutes and regulatory law in force.

The treaties mentioned as part of the data base are the English version of the Council of Europe treaties, which have been acquired with the consent of the Council from the trials conducted on retrieval of international law with the British STATUS system. The machinereadable text has been converted to IMDOC format without any major difficulties.

It should be mentioned that the Ministry of Justice has selected a statute, not a section of a statute, as a document for retrieval purposes. Contrary to general belief, the Ministry maintains that this is adequate, as IMDOC permits retrieval on two levels. First-level search isolates possible relevant statutes through a conventional query, second-level search employs the focus-function of IMDOC for browsing through the statute. This solution is economically sounder, and makes interface with text-processing systems (for instance for photocomposition) simpler, the Ministry points out. We feel we shall have to join the sceptics - at least until a more satisfactory "multilevel "search function has been introduced.

In our opinion, the most interesting use of the system is that made by the administrative courts. The system is here employed for a dual purpose. Primarily it is used to keep track of registered cases. A short summary of the case is registered when the case is filed in court, along with a number of standard data, like the names of the parties, journal number, etc. (The form is reprinted in Samarbetsorganet 1972 Bil. 1.) On 18 June 1975 the number of such summaries was 114 000, cfr. Alpsten 1975:9. This constitutes a computer-based journal which in one sense is shared by all the courts geographically located in different towns. This facilitates the co-ordination of the cases both in regard to procedures (two cases entered in respect of one person at two different courts may easily be merged), and with respect to substantive law (the court will not take different views upon similar cases pending before different courts). In this philosophy we once more recognize the needs that in the United States gave birth to the RIRA system

Actually, the IMDOC journal system for the administrative courts is of general interest. It does, for instance, have defined fields - like the date of the decision. This is usually a possibility not recognized in conventional full-text systems, where a query using a date may just as well retrieve a document in which that date occurs in another context - for instance as the date when a contract was broken or one of the parties filed his claim. This difficulty is solved simply by adding prefixes to the information, and so defining the information in a unique way. The date of the decision is, for instance, defined by the prefix "D:", and the query

D:1.1.1969

will only retrieve documents decided on that day. Also, the journal has facilitated the

[Page 101 ]


interface between the courts and the public, who in their enquiries to the courts often omit the journal number or other forms of identification. The story is told of a postcard that was completely illegible except for the postmark and the first name of the enquirer. This was sufficient for finding the correct case among the thousands registered in the computer-based journal.

Secondarily, the IMDOC system is used for retrieval of precedents. Cases of general value are stored in full text, and the courts (as well as the Ministry of Justice) may have access to these decisions through the terminals using IMDOC. In Summer 1976 the number of full-text documents was 1 900, or 1.5 per cent of the number of summaries, cfr. Alpsten 1975:9.

Actually this demonstrates how the retrieval system at the administrative courts is an integrated part of a greater, total system. The cases are first registered with the aid of a text-processing system, the IBM system ATMS. With the assistance of ATMS, an authorized proof of the decision is obtained. The stored document is then, by a bridge programmed especially for this purpose, transferred to the IMDOC data base. The same text is used twice a year as raw material when producing the case reports published by the administrative courts. The printing is done by a commercial firm without extensive editing or the inserting of symbols to govern the graphical layout. This system, comprising subsystems for text processing, text retrieval, and printing, is to our knowledge one of the few fully integrated legal information systems in operation. The operation is, of course, facilitated by the comparatively small amount of material produced by the administrative court - and by the co-ordination through the national council. An extension to comprise the general courts is now under consideration cfr. Justitiedepartementet 1975.

The case reports from the administrative courts were used as data base in a controlled experiment in text retrieval conducted in 1974 by the Norwegian Research Center for Computers and Law in co-operation with the Swedish Ministry of Justice, cfr. Harvold/Bing 1974, including a description of the system and IMDOC. Most descriptions of the RI system are, of course, in Swedish; for English descriptions, see Alpsten 1972 and 1973, for a German description, see Carsten 1971.

The Ministry of Justice did set out with a simple version of IMDOC which only had the Boolean and-function besides several other limitations. These did not seriously hamper the users, as most of them conducted rather simple searches. This may indicate that some systems are offering sophisticated search strategies of doubtful value to the user. The Ministry has, however, now acquired a more advanced version with the full spectrum of Boolean operators, ranking functions, etc.

[Page 102 ]


Before leaving Sweden, we should like to point out the valuable work carried out at the Faculty of Law at Stockholm University by the Working Party for EDP and Law set up in 1967 by lic.jur. Peter Seipel. The articles of Seipel (1970) are basic in Scandinavia, and his own work and co-projects with the Working Party for Quantitative Linguistics (KVAL) have enriched the RI-project with valuable insight - especially into the possibilities of generating automatic indexes to legal publications (case reports, gazettes containing new legislation, etc.).

5.6 WESTERN GERMANY8

8 We are indebted to Regierungsrat Albrecht Berger of Bundesministerium der Justiz for comments on this section.

5.6.1 Introduction

One would have thought that the computer would offer a tempting technology to the well-organized minds of German lawyers. However, no real development seems to have emerged until the late 1960s - to some extent touched off by the World Peace Through Law conference in Geneva (above at section 4.41), cfr. Fiedler 1968 with references.

But when the interest caught on, it soon blossomed into enthusiasm, initiative, and projects.

The Federal Republic of Germany has - like the United States of America - a federal constitutional structure with a corresponding complexity of the legal structure. It would seem that the information crisis - which had been emphasized in the early days of the American development - was felt even more strongly in Germany. This may have several reasons, one of them probably being the more active part federal and local public administration plays in Germany. Basic differences in the legal systems may also have been important, for instance the emphasis on case law in the Anglo-American system. This may have contributed to lending greater importance to "rule of law" arguments ("Rechtssicherheit") in Germany than in the United States.

In early papers the information crisis in law was pointed out (cfr. Simitis 1967, Fiedler 1968:273-274), and it was brought home by the monograph of Professor Spiros Simitis of 1970: Informationskriese des Rechts und Datenverarbeitung. And at the same time, the first projects concerned with computer-based legal information retrieval systems were created, cfr. the status report at the 48. Deutschen Juristentag (Deutscher Juristentag 1970).

[Page 103 ]


5.6.2 Juradat

One of the first efforts was the project sponsored by the private firm Juradat GmbH & Co. KG, which was announced at a press conference in March 1970 in Berlin, cfr. Haft 1970: 169. The data base was to be composed of case law, which was registered in extracts: the original text of the documents was registered, but only selected parts of general or principal interest. The registration started late in 1969, and the system was demonstrated in 1971, having then a data base of 30 000 documents, a planned expansion to 60 - 80 000 documents with a yearly increase of 2-5 000 documents cfr. Das Juristische Informationssystem 1972:408.

Software was developed by Juradat for a UNIVAC computer, and was said to be of a rather original design, featuring dialogue techniques, search strategies based on Boolean operators, thesaurus and automatic synonym functions.

Juradat ceased its operations in March 1973 owing to financial difficulties.

5.6.3 DATEV - Steuerrechtsdatenbank

Another private initiative was the one taken by the institution Documenta Steuer und Recht founded by Deutschen Wissenschaftlichen Steuerinstitut der Steuerbevollmächtigten, three publishing houses specializing in legal literature, and DATEV (Datenverarbeitungsorganisation der Steuerbevollmächtigten für Angehörigen des steuerbearbeitenden Berufes in der Bundesrepublik Deutschland), cfr. Sebiger 1971:70. In 1969 an agreement was made with Finanzverwaltung des Bundes, some states and IBM-Germany for an experimental project in tax law. The aim of this effort was to develop standards for a documentation system within the tax administration; 52 per cent of the persons asked stated their interest in a computer-based system (68 per cent in the group younger than 40 years), cfr. Sebiger 1971:70.

The project selected a data base of decisions by the Bundesfinanzhof (1073 decisions from 1951-1969) on corporation tax law and a set of related regulatory laws. The decisions were structured in a number of fields (cfr. Sebiger 1971:72), and registered in full text. In the experiment the retrieval was conducted by the IBM program DPS (which, as we have seen, has also been used by CEDIJ in France and CELEX in Brussels).

According to Sebiger 1971:72 the document is structured in 14 fields. The first 10 of these contain bibliographical information, field 11 contains descriptors selected from a list, and

[Page 104 ]


fields 12-14 the text of the decision, structured into introduction, facts of the case, and comments of the court.

Since the conception, the ambition of the "Steuerrechtsdatenbank" has been escalating. It is now the goal to have a universal documentation of tax law: statutes, decisions, and literature. The documents are still registered in full text by optical character readers. A stop-list of 256 words is used, reducing the full text by some 39 per cent, cfr. Simitis 1974:12, still leaving about 58 000 different word forms in the data base. Also, the retrieval system was changed from DPS to the terminal-oriented dialogue system STAIRS, cfr. below in section 7.2. An example of the dialogue may be found in Conradi 1975.

In 1975 the data base consisted of about 20 000 documents: The published decisions of Bundesfinanzhof since 1950, responsa from the fiscal administration, articles, papers, commentaries on important regulatory law in tax law journals from 1974, decisions of courts of first instance since 1972, cfr. DSWR sonderheft 1975:6.

The "Steuerrechtsdatenbank" became operative in March 1975, and users were connected to the central data bank through a terminal network. In September 1975 a user research project was launched in order to evaluate the total system.

Computer-assisted administrative systems play a major role in the German tax administration - and the development of a legal reference retrieval system is only one of the projects in which DATEV is engaged. Since 1971 DATEV has published the journal Datenverarbeitung in Steuer, Wirtschaft und Recht (DSWR), in which current development is discussed.

5.6.4 Bundessozialgericht

Also, within the other major branch of public mass administration - the social security administration - a need for an improved legal information system was felt. Since 1954, the Bundessozialgericht has maintained a manual documentation system, covering all types of legal sources. By 1970 this system had grown to include 450 000 file cards (Brackmann et al. 1974:9), and the increased activity of legislators and other producers of legal sources within the field of social security law made the system a victim of the legal information crisis. A project group was set up by the Bundessozialgericht to look into the possibility of finding a computerbased solution. This group drew up the basic design for a social security reference

[Page 105 ]


retrieval system, cfr. Brackmann et al. 1972. This basic design was to a great extent a computerized version of the existing manual system.

The group decided to have the documents represented as surrogates. These are of five different types, corresponding to statutes, statutory material, regulatory law, literature, and decisions (cfr. Brackmann et. al. 1974:15). Each of these categories has a number of fixed fields (2- 6) as well as a field for an abstract. The abstract is composed of all terms that may adequately describe the content of the document. These terms may be selected without the restriction of a predefined vocabulary, though there are - of course - some norms to be considered while composing the abstract, cfr. Brackmann et al. 1974:14.

A test data base composed of 1 200 document surrogates related to family allowance payment was created. The surrogates represented a wide spectrum of document types: decisions, articles from legal journals, statutes, monographs, parliamentary material, etc., cfr. Brackmann et al. 1974:37-38 for examples of the surrogates and the corresponding original documents. In these documents, 41 per cent of the indexing terms constituted the abstract, while 59 per cent of the terms occurred in the fixed field (as "aspektgebundene Deskriptoren"), cfr. Brackmann et al. 1974:20.

For the retrieval, the GOLEM I system, developed by the German computer manufacturer Siemens, was employed. The system permits the use of Boolean operators, search words being specified and combined in two steps. Examples of dialogue are given by Brackmann et al. 1974:28-31.

The experiment was brought to an end in 1973, and the plans for a more comprehensive "Sozialrechtsdatenbank" are part of the grand JURIS design. GOLEM I has now been replaced by GOLEM II, assisted by the analytical text program PASSAT, which creates a lexicon of all words occurring in the data base. A short description of GOLEM with PASSAT will be given below in section 7.4. The work on creating a data base of better coverage has commenced; by March 1976 the data base included 20 000 decisions, cfr. Berger 1976:21.

It will be noticed that Bundessozialgericht, in contrast to the DATEV group, chose to have documents represented as abstracts rather than in full text. The reasons for this may be several, but the strong ties between the present manual and the projected computer-based system appear to be an obvious explanation. It may also be a valid justification, since the users at the court certainly are best able to examine the benefits and

[Page 106 ]


disadvantages of the existing system. Also, registration of the 450 000 documents represented by file cards in full text must be rather prohibitive. On the other hand, some types of documents - i.e. the Court's own decisions - could for the future be registered in machine-readable form at the source at negligible extra cost. Cfr. the critical discussion of Simitis 1974:9. Such plans exist for the JURIS sub-project on tax law (cfr. below at section 5.6.5.); decisions by the Bundesfinanzhof will be written by the text-processing system INFOREX 5 000 for easy updating of the data base.

5.6.5 JURIS

The two projects within the tax and social security administation described above were both started at the end of the 1960s. In October 1970 the Bundesministerium der Justiz (BMJ) established a project group to look into the possibilities of improving the legal information system through computer-assisted solutions. This group published the results of its findings in January 1972 in the comprehensive report Das Juristische Informationssystem - Analyse, Planung, Vorschläge.

A summary of the report is published in English as "The Legal Information System - Analysis, Planning, Proposals", Council of Europe EXP/Ord. Jur. (72) 15. The project group was composed of representatives of BMJ, Gesellschaft für Mathematik und Datenverarbeitung GmbH (GMD) Bonn, and the consulting company C-E-I-R GmbH, Frankfurt.

On the basis of this report, a "development system" was launched, directed by Dr. Josef Fabry of the BMJ. This "development system" was designed to acquire further information for the "completion system", and a number of choices were postponed till the experiments of the "development system" were completed.

One of the major choices is, obviously, the selection of a retrieval system. In the "development system" several possible retrieval systems were to be tested. The activities of Bundessozialgericht were based on the Siemens software package GOLEM I. For the activities within tax law, the STAIRS system of IBM was used

Above at section 5.6.3 the activities of DATEV within the field of legal information retrieval have been described. As part of JURIS, there is also a sub-project of tax law, which has more or less the same scope as DATEV's activities. The JURIS sub-system is operated by BMJ, Bundesfinanzministerium and Bundesfinanzhof, in direct competition to the system operated by DATEV. It has evidently not been possible to find a form of co-operation that does

[Page 107 ]


away with this double effort. The JURIS sub-project for tax law used the STAIRS system originally, but has now converted to the GOLEM according to the choice discussed below. The DATEV does, however, still use STAIRS.

In addition the BMJ established a smaller test data base of constitutional law which was designed to be system-independent. This was used for benchmark tests against different retrieval systems - in this respect the BMJ co-operated with the GDM. In addition to STAIRS and GOLEM, the retrieval system TELDOK of Telefunken has been tested, as well as the system TR/1, developed by GMD.

The "Text retrieval system/1" (TR/1) deserves some special comments. It was developed in 1972-75 by H.-Georg Krämer of the GMD, and has several features not commonly found in a general text retrieval system. This includes a modular data bank structure, a floating document unit for retrieval purposes (the user may specify the document unit for his Boolean query), documentation of statutes later replaced by amendments, right- and left-hand truncation, and the exploiting of citation structures in statutes. The system has, however, only been tested on the small data base of constitutional law, but there is no reason to expect that the performance would be much less satisfactory on a greater data base. This system is not very well known through literature, but a description may be found in Krämer 1975.

As stated above, the "development system" is just one stage in the missile designed to reach the target: an all-inclusive legal information system for Germany, the design known as JURIS (Juristische Informationssystem).9 Through the " development system" a number of problems were analyzed and tested. The co-operation with GMD has proven fruitful, and researchers like Professor Herbert Fiedler and Dr. Friedrich Gebhardt have made valuable contributions not only to the JURIS system, but to legal informatics as a whole. A summary of the projects and their results may be found in Gebhardt 1975. Also, comprehensive user research was initiated, cfr. Jungjohann/Seidel/Sörgel/Uhlig 1974.

9 Some confusion may arise from the fact that the acronym is identical to the one denoting the Justice Retrieval and Inquiry System, cfr. above at section 4.4.4. It is, however, an obvious acronym, as is shown by the fact that the research program in legal informatics at the Norwegian Research Center for Computers and Law was until 1972 known as JURIS - JURidisk InformasjonsSystem. This has later been changed to NORIS in order to reduce the confusion somewhat.

As stated above, JURIS is designed to become a comprehensive system, documenting all types of legal sources to all lawyers in the whole of Germany. This is an ambitious goal which cannot be reached without major investments over a rather long period. The "development-system" alone included (1976) a data base of 20 000 descisions by

[Page 108 ]


Bundesozial-gericht, 10 000 tax law cases, and a terminal network of 10 terminals. JURIS (1975) gives the project status at the end of 1975.

Several alternatives for the " completion system" have been discussed in theory. The most inclusive alternative has a data base composed of a retrospective documentation of some 2 500 million characters, and an annual growth of 1 800-2 200 million characters. Access to this base is envisaged through a network of 3 - 40 000 terminals, cfr. Das Juristische Informationssystem 1972:256-258. Costs will vary widely with the different choices of data base coverage and terminal network size, from 15 to 60 million DM, and a suggested annual cost for data registration and updating of 10 million DM (1972).

The "development system" has now moved past the first cross-roads: JURIS has selected the Siemens system GOLEM II as a basis for further development.

A project has been launched by Siemens which is expected to result in a new and improved system, provisionally known as CONDOR (Communication in Natürlicher Sprache mit Dialog-Orientierten Retrieval-Systemen), cfr. Banerjee/Reinhardt 1974.

It is true that today the figures of the "completion system" are under revision, but they certainly demonstrate the scope of the JURIS project. It is a major effort on the part of the BMJ to tidy up the legal information system. This implies, of course, much more than streamlining the existing system; it is in its own right a major change in the infrastructure of the German legal society. Just because of the grand scale, the social consequences of this change are more apparent than in the systems we have presented so far - though such social implications will always be present when an information system is being changed. Das Juristische Informationssystem (1972:267-273) discusses some of the political, legal, and social implications. As might be expected, critics have found this discussion to be less thorough than it ought to have been. Simitis (1974:13) claims for instance that the development of the legal information system is increasingly being projected into "eine völlig unkritische, rein technische" dimension.

A well-known effect of technological innovation is that one questions the basic functions and design of the changing system. The critical discussion in Germany is an example of how lawyers, through a proposed, basic change have become aware of the relationship between the information system and their job situation. Certainly this relationship is something not established by the new technology - the proposed

[Page 109 ]


innovation has only brought it to light. A more critical attitude toward the existing legal information system would probably be the best strategy for guiding the technological development into the most desirable directions. An effort should be made to ignite a discussion at an early stage to prepare the ground for technological change. In some countries - like Germany - the technological change may have taken most lawyers off guard. In other countries - like Norway - there is still time.

5.6.6 Concluding remarks

In this brief description of Germany, the role played by the university centers ought to be mentioned. We have already touched on the research institution GMD, which has its own Institut für Datenverarbeitung im Rechtwesen. We may also mention Professor Wilhelm Steinmüller's Arbeitsgruppe Rechtsinformatik at the Universität Regensburg, Professor Spiros Simitis' Forschungsstelle für juristische Dokumentation in Frankfurt, and Professor Herbert Fiedler's (who also is the director of the Institute within GMD) Forschungsstelle für juristische Informatik und Automation at Universität Bonn.

This rather strong academic engagement is reflected in the literature: a number of journals are published which are more or less devoted to legal informatics and computer-assisted administrative systems. We have already mentioned the DSWR published by DATEV. Die Forschungsstelle für juristische Dokumentation in Frankfurt publishes a journal entitled Kybernetik - Datenverarbeitung - Recht (KDR); the first volume came out in 1971. J. Schweitzer Verlag of Berlin publishes three journals: Arbeitspapiere Rechtsinformatik (first volume appeared in 1970), EDV und Recht, first volume appeared in 1970, and Datenverarbeitung im Recht (DVR) first issue published in 1972. A more practical orientation, similar to that of DSWR, can be found in Offentliche Verwaltung und Datenverarbeitung (OVD), published since 1971 by Verlag W. Kohlhammer.

One may also mention the documentation service offered by REDOK at Regensburg, initiated in 1973 and later integrated into the service offered by Institute per la documentazione giuridica, Florence - cfr. above in section 5.4.1. The REDOC service was preceded by the bibliography JUDAC (Schubert/Steinmüller 1971). A number of additional index-oriented legal information systems are operational - cfr. for instance Berger 1973:336-337, 1976:12.

From our short exposé it will be evident that German publishers are active in the field of legal informatics - more active than their English or

[Page 110 ]


American counterparts (though major publishers like Butterworth and West have ventured into the field). We should also remember that three publishing houses were part of the Documenta group (above at section 5.6.3). The common interest of these publishers has given birth to an organization, Verlegervereinigung Rechtsinformatik, with 29 publishing houses as members. The all-inclusive design of JURIS has made evident the conflict of interest which exists between the publishers of established journals and literature, and the state-monopoly represented by the legal information system. Actually, Das Juristische Informationssystem (1972:271-272) discusses this problem, and the conflict is further debated in a report of the Verlegervereiningung Rechtsinformatik (1975), "Staatliche Rechtsdokumentation - Gefahr für die Juristische Fachliteratur".

It is evident that the resources and perspectives related to the JURIS project have made Germany maybe the most interesting country with regard to legal informatics. Several factors cause this: the grand, all-inclusive design of JURIS, the research that has been conducted within the frames of the "development system", the critical studies and discussions sparked off by the "development system", and the number of scholars and others who have become active within the field. In conclusion we should like once more to draw attention to Sweden, where also a national plan for a legal information system has emerged (actually at an earlier date) through RI. The differences between the strategies adopted by the two countries are - and must evidently be - numerous. But maybe the most striking difference is the basic philosophy: in Germany the JURIS system is envisaged as growing at a constant rate until it becomes the all-inclusive legal data bank accessible to all; in Sweden the RI is a careful patchwork, where bits and pieces are cut out of the traditional system and sewn onto the computer-based design whenever a need is politically realized in parts of the justice structure. Although one may not be able to choose, one ought to consider which of these philosophies is the most adequate.

5.7 UNITED KINGDOM10

10 We are indebted to Dr. Bryan Niblett, Professor in Computer Science, of University College, Swansea for comments on this section.

5.7.1 STATUS

One would have thought that Britain, owing to the early Oxford experiments conducted by Tapper (cfr. above at section 4.3.4 and below at section 11.3.2), would pioneer the field of legal informatics in Europe. But

[Page 111 ]


the first project of any scope was not launched till 1968-69, and then by an unlikely sponsor, The United Kingdom Atomic Energy Authority. This was partly due to the personal initiative of Bryan Niblett, a barrister and scientist engaged at that time by Culham Laboratories. He was fascinated by the possibilities offered by the computer (cfr. his early paper "The Computerization of the Statute Book", 1968), and together with Norman Price, created the first version of STATUS (STATUte Search), running on Culham Laboratories' KDF9 computer, cfr. Niblett/Price 1969. This was a batch version; the first interactive version was implemented in 1970, cfr. R-direktoratet 1974 vedlegg 1.

Today STATUS is the property of the Atomic Energy Research Establishment at Harwell, and its development is still being directed by Norman Price. We shall give a brief description of the retrieval system below at section 7.5 and will therefore here mainly be concerned with STATUS as a project. But one feature of the programs ought to be emphasized: unlike most other text retrieval systems, they are written in a high-level language (FORTRAN). This makes STATUS rather more portable than other systems. It has also been implemented on a number of different computers: IBM 370, PDP 11, Modular 1 and ICL 1900. The portability also makes the system commercially attractive, and STATUS has been purchased by the Norwegian government for use within the public administration, as well as by the Dutch legal publishers Klüwer.

In Norway, the programs have been converted to a Honeywell-Bull version, cfr. R-direktoratet 1975, and recently to a UNI VAC version by the University of Bergen. This probably makes STATUS the text retrieval system operative on the greatest number of computers, and also probably the first to be implemented on a mini-computer (PDP 11).

The original data base of STATUS was the Acts of Parliament and associated delegated legislation dealing with atomic energy - 770 documents consisting of 140 000 words (cfr. Niblett/Price 1970:290). This was, of course, too small a volume on which to assess the performance of STATUS. Additional material has been made available by a legal publisher (Butterworth): 350 000 words of tax statute law, including the full text of the Income and Corporation Taxes Act of 1970. This enrichment also demonstrates the close ties between computer-based typesetting and retrieval systems - something which is further emphasized by the cooperation with H.M. Stationery Office, which will have an up-to-date compilation of Acts of Parliament known as Statutes in Force, from

[Page 112 ]


which the full text of these statutes will be converted into machinereadable form as a byproduct of the printing process.

The first users of the STATUS system for retrieval purposes were, however, not English lawyers, but the Council of Europe. As mentioned above in section 5.1.2(1), the Council of Europe has been active within the field of legal informatics. To gain insight into the possibilities of full text systems, the European Treaty Series has been converted into a data base at Harwell. The full text of both the English and the French versions has been registered, each set consisting of 73 documents of approximately 200 000 words, cfr. Price/Bye/Niblett 1974.

Actually the Treaty Series holds interest for several countries, and with the consent of the Council of Europe, both Sweden and Norway have been given copies of this data base for retrieval by the IMDOC system (Sweden) and the Norwegian version of STATUS.

A user experiment has been conducted with STATUS, the data base having been made available to Members of Parliament by way of terminals. However, the first British users turned out not to be lawyers, but the Chemical Emergency Service and Safety in Mines Research Establishment, cfr. R-direktoratet 1974, vedlegg 1. This reminds us of the fact that a text retrieval system - though developed for legal reference retrieval - is a general system, as is also proved by the general nature of other commercial systems like STAIRS and GOLEM.

A new version - STATUS II - has recently developed in order to improve various aspects of the older version, which experience had indicated could be modified with advantage. An outline of the changes may be found in Price 1975.

5.7.2 QUOBIRD

The STATUS project was initiated in 1968. The same year a joint project was set up at the Queen's University, Belfast, between the Computer Centre and the Faculty of Law.

This project aimed at developing an off-line Case and Statute Citator for Northern Ireland. Like other small countries, Northern Ireland suffered from a lack of adequate legal indexes - the last Citator had been produced in 1953, and it was hoped that the project could provide such an index, and at the same time contribute to a better understanding of the potential of computer-based legal information systems, cfr. Aitken/Campbell/Morgan 1972:25.

[Page 113 ]


The Citator was successfully completed in April 1970. The project did, however, reveal drawbacks of the system - one of the major ones being the necessity of having professional lawyers to examine the statutes and underline the words to be indexed. One aspect of this we have encountered several times in this historical survey: the suspicion that the indexer is not being as consistent and thorough as desired. But the main point was that there just was not a sufficient number of competent lawyers who would agree to do the painstaking and boring work of manual indexing (QUOBIRD 1974).

In the Department of Computer Science and the Computer Centre, a large research group on interactive computing had been set up; benefiting from this expertise, a basic design for a conversional retrieval system was drawn up in mid-1968, the planning period being extended till March 1969 and a pilot system being implemented in the following five months. On the experience gained through these projects, a first interactive reference retrieval system was designed and implemented: BIRD 1. This system was developed during 1970. Tests with the system on a data base of 400 000 characters and mathematical simulations led to modifications, and the final version of BIRD 1 was completed at the end of 1971. Cfr. Lancaster/Fayen 1973:102-103 for examples of the dialogue and a short characterization.

BIRD 1 had primarily been designed as a research tool to study on-line information systems. Changing the perspective to the needs of the users led to a different basic design: a system adapted to be easily mouldable to meet the different needs of different groups of users. The next system - BIRD 2, more commonly known as QUOBIRD - was developed between mid-1971 and mid-1974. Further improvement, especially, in updating efficiency, has since then taken place. It is now being offered on a commercial basis by ICL.

It is estimated that QUOBIRD has taken approximately 30 man years of research and development over more than 6 years. The resulting product is a high standard interactive reference retrieval system, independent of operating system and written in FORTRAN. The retrieval language includes all Boolean operators, synonym functions, right-hand truncation, facilities for storing results of a search, etc.

A comprehensive documentation may be found in QUOBIRD 1974. Efficiency assessments and system performance are discussed in QUIS 1974:33-38, a report also containing rather comprenhensive statistical information on word distribution etc. in the test material.

[Page 114 ]


QUOBIRD is the result of a research project, and has as yet not become operative in any legal information system. Legal documents are, however, part of its data base (Northern Ireland Constitution Acts, and Westminster Statutes in Force). It appears probable, however, that further development will take place on the ground fertilized by the QUOBIRD project, cfr. Campbell 1975.

5.7.3 Concluding remarks

As stated above, it is rather surprising that Great Britain has not come further along the road of computers applied to law. This is also felt in Britain, cfr. Aitken/Campbell/Morgan 1972:1. In order to promote the use of computer-based systems, the Scottish Legal Computer Research Trust was founded in January 1970 by solicitors in Edinburgh and Glasgow; since then academic lawyers and advocates have joined. One of the major achievements of this Trust is the report Computers for Lawyers (Aitken/Campbell/Morgan 1972). This report drew several conclusions about how a computer-assisted system might best be introduced into Scotland, cfr. Aitken/Campbell/Morgan 1972:135-136. The conclusions are in many ways of general interest, as they are based on the analysis of the legal information system and the needs of the users in a rather small legal system, i.e. the Scottish. (Cfr. also Campbell 1975 b.)

One of the conclusions was that an organization corresponding to the Trust including all of the United Kingdom ought to be established. And such an organization got under way on 11 December 1973 - The Society for Computers and Law Limited. In a statement issued by the Society, it is proclaimed that it "will try and monitor, where possible, the development of the use of computers for legal purposes and avoid haphazard development and proliferation of conflicting systems of information storage and retrieval, and other uses of computers for lawyers".

The activities of the Society indicate that it will serve the purpose of integrating the activities in the United Kingdom in order to arrive at some sort of general plan pointing toward the developing of legal information systems. The Society has its roots in the needs of the practising lawyer - therefore it is also interested in other types of computer systems of assistance to the lawyer. The Society has since the summer of 1974 published a newsletter, Computers and Law, reporting on the development in the United Kingdom.

By way of conclusion, we will only mention the activities in the field of

[Page 115 ]


computers and law at British universities. There are three centers of this activity, pivoting on three names: Colin Tapper is still at Oxford (Magdalen College), and has, among other things, written a comprehensive introduction to the subject of computers and law (Tapper 1973). Bryan Niblett - one of the founders of STATUS - is now Professor in computer science at the University of Swansea. Among his more recent works, one may especially mention his projects on vector-based text retrieval (cfr. Boreham/Niblett 1975 and Niblett/Boreham 1976). Thirdly, Colin Campbell - one of the authors of Computers for Lawyers - has become a Professor of Jurisprudence at Queen's University, Belfast, and will add impetus to the development of a legal information system based on QUOBIRD.

[Page 116 ]


6 A final look at North America

6.1 INTRODUCTION

We started our historical survey in the United States, and traced the outlines of the development till the end of the 1960s by describing a few key projects. We did not, however, end up with a characterization of the status today, in the mid-seventies. Below, we shall try to bring our survey up to date by examining three North-American projects. We will not suggest that there are differences in principle between the projects described below and those above in section 4 - though it has been suggested that the legal retrieval projects of the United States may be categorized in "generations" (cfr. Mackaay 1973:103). But the three projects presented here resemble each other in the respect that they are - to some degree - commercial projects, aiming at furnishing the practising lawyer with a legal research service. This in contrast to three of the North American projects discussed above, FLITE, RIRA, and JURIS, all of which are mainly serving branches of public administration.

6.2 DATUM1

1 We are indebted to Professor Philip Slayton of McGill University for his comments on section 6.2-3.

DATUM is an acronym for "Documentation Automatique des Textes juridiques de l'Université de Montréal", a joint research venture of the law faculty and the computing center intended to provide the legal profession as a whole with a bilingual computerized case retrieval system. The project was initiated at the end of 1968, and has been managed by Professor Ejan Mackaay of the law faculty.

The retrieval system was inspired by Horty's design of a full-text system, but was developed from scratch by the DATUM team (cfr. DATUM 1970). The query language is rather simple, including a full

[Page 117 ]


spectrum of Boolean operators and positional logic, cfr. Thibault-Iezzoni 1971. Special interest is associated with the thesaurus structure of the system, cfr. below. The system is implemented on a CDC 6 400 computer.

In May 1971 a small scale test was initiated on 254 decisions of the Quebec Court of Appeal (Queen's Bench, 1951). The results were sufficiently encouraging to warrant the moving forward to a bigger data bank and launching a service in principle available to all Quebec lawyers by the end of 1970. Since July 1971 the system has operated on a data base of 140 million characters, cfr. Mackaay 1973:104, representing about 15 000 cases included in the three most utilized series of case reports in Quebec, the reports of the Supreme, Appeal, and Superior courts from 1945 to date. The inclusion of two more series of reports is planned, cfr. Tapper 1974:13, Boucher 1971:13. Legislation is not included in the DATUM data base. Another Quebec project, however, the MODUL project at the Université Laval, has been concentrating on this type of legal sources, cfr. Goulet/Houle/Leclerc-Houde 1971.

To begin with, DATUM was offered to the private lawyers - a total population of approximately 6 000 lawyers and notaries. It became apparent that the costs of the project could not be covered in this way, and since July 1973 the services have also been made available to judges and lawyers within the public administration. In return the government pays a fixed sum to the DATUM project. In September 1973 it was estimated that one-third of Quebec lawyers had become users of DATUM, and the number of questions accepted by DATUM had increased from 1 000 in 1972 to 6 000 in 1973, cfr. Tapper 1974:12.

The DATUM system was created in order to satisfy the needs of Quebec lawyers, a factor that is reflected in its design.

The first and major requirement is that of bi-lingualism. Canada has two official languages, French and English. For that reason (and especially in the province of Quebec), a legal retrieval system has to work with legal sources in both languages. This is not merely a problem in linguistics; as Mackaay has pointed out (1971:59), the terminology is also forged by two different legal traditions: civil and common law. This situation creates quite a complex synonymity problem.

This complex situation hardly favors a full-text solution. We have above in section 5.2 seen how the problem of bi-lingualism was solved in the Belgian system CREDOC: dictionaries in both Dutch and French translated the terms into a common set of concept-numbers, a solution made practical through the use of a controlled language for constructing

[Page 118 ]


document surrogates. The CREDOC solution is, it would seem, closely associated with an indexing philosophy.

DATUM chose to develop a unilingual thesaurus. In fact two thesauri were developed, the so-called s-thesaurus and the so-called g-thesaurus. The latter supplies a grammatical expansion of the words in a query: plural, feminine form, inflected form, etc. This thesaurus is rather conventional, and we will not pursue it further here (for a discussion, see Schwab 1971).

The s-thesaurus, however, is designed to extend a given term into a series of equivalent words or expressions in both English and French. This is necessary; though a lawyer may read both English and French texts without difficulty, it is not to be expected that he will be able to construct adequate queries with the same ease in both languages.

For developing this thesaurus, the method suggested by Irving Kayton (1966) was adopted. Selected passages from the cases to be included in the data base were examined manually, and key words or expressions were replaced by synonyms - a synonym being a word or an expression which in context does not change the meaning of the sentence. By this method, a great number of "source lists" were produced consisting of the term originally found in the text and the assigned synonyms. These lists were then merged by using programs which, when comparing the identity of the lists, decided whether two source lists could be merged into one common list for the source term, or whether they should be kept apart, representing different meanings of homonymous source terms. Thus, lists of synonymity in each language were arrived at. The source terms were then translated, and the lists of the corresponding source terms in the other language were merged with the lists in the first, creating a unilingual thesaurus with entries in both languages, and synonyms in both languages for any entry.

The reciprocity of the synonym structure was used to group synonyms in two categories - close synonyms and less related synonyms. This is to say that if the source list of term A includes term B. and the source list of term B includes term A, there is reciprocity and close synonymity, If, on the other hand, term B occurs in the source list of term A, but not vice versa, term B is a less related synonym to A.

In this way the s-theasaurus will, for a certain source word, list all registered meanings. For each meaning there will be a list of synonyms, organized on several levels: (1) precise translation, (2) close synonyms in the same language as the source word, (3) close synonyms to the translation under (1), (4) less related synonyms in the same language as the source word, and (5) less related synonyms to the translation under (1). The user may select to which level of synonymity he will expand his query.

[Page 119 ]


To our knowledge, this thesaurus represents the major attempt at present to create a bi-lingual legal information system on a full text basis. The departure point is important - the substitution of synonyms for words in context as they actually occur in the material to be stored. In this way, it seems reasonable to hope that the synonymity problem has been solved more satisfactorily than would have been the case if synonyms had been substituted on a more general level.

The creation of the s-thesaurus was finished in May 1972 (cfr. Mackaay 1973:104). The philosophy behind it is described in more detail by Boucher et al. 1970:9-16, Mackaay 1971, Schwab 1971, and Mackaay 1973b.

Several similarities may be found between the DATUM design and the design of DOCILIS by CEDIJ, France - cfr. above at section 5.3.2. This is by no means accidental, the results available from CEDIJ seem to have been carefully examined by the DATUM project. There are, however, basic differences - for instance, the DOCILIS operates with a hierarchically structured synonym thesaurus, while DATUM has settled for a term-to-term approach - cfr. Mackaay 1973b:3-4. Other differences arise, of course, out of the bi-lingualism of DATUM and different soft- and hardware.

DATUM will in the near future change some of its basic design. A new version has been announced for 1977. This version will include on-line facilities (cfr. below), but a normalized vocabulary will also be introduced as an alternative to the full text, for more economical operation of the data base. A novel method for assigning different weights to words according to feedback from users will also be adopted, cfr. Mackaay 1976. After the reorganization of DATUM mentioned below, a yearly increase of 2 000 cases is expected.

The second point of principal interest with respect to the DATUM project is its interface with the users. At the moment it is the only legal information system in North America operating on a service bureau basis. Questions may be formulated in writing (cfr. Slay ton 1974:27-28 for an example of the forms employed), or a consultant at DATUM may be contacted and the question formulated in a dialogue with the consultant. The query is then constructed by the consultant and batch processed. The resulting output (several optional output formats are available, cfr. Slayton 1974:11-12 and 29-32 for examples) is then returned by mail to the user accompanied by the comments of the consultant. There is also a follow-up session, in which the consultant discusses the result with the user (cfr. Tapper 1974:12).

DATUM actually also keeps files of past searches as a tool to improve the quality. If an identical question emerges, one may of course supply the material retrieved earlier. And elements of earlier searches may also be relevant. In this way, DATUM may - as Tapper (1974:12-13) puts it - create a post facto thesaurus.

[Page 120 ]


The justification of the service bureau model - which we have met above in respect of several European systems, for instance the Belgian CRE-DOC, cfr. above in section 5.2 - is the user needs. Canadian user research (for instance Operation Compulex 1972) has demonstrated that the legal information situation is least satisfactory in outlying areas and at small law offices. With respect to such users, it is not realistic to expect that they will acquire terminal equipment, which would mean considerable financial investment, cfr. Mackaay 1973:105-106. This is certainly an argument of considerable weight. As we have shown by our model of the legal decision process, availability factors play an important role in determining the coverage of the information system for a user. Distance is just one such trivial, but nevertheless all-important, availability factor. Cost is another. Introducing an improved legal information system that is in principle available all over a country will have little effect if "distance" is just replaced by "high cost". Actually, the users would probably come to be large law firms in central areas, just those users who already enjoy a better information situation. In this way, the introduction of a commercial, computer-based legal research service may enhance undesirable trends in the total legal information system, and further impair the "rule of law" (cfr. for instance the section on coverage discrimination, below in section 12.2.3). The service bureau model is, of course, just one way of solving this problem (another would be to heavily subsidize a terminal network). But it is an obvious and simple solution. The users do not have to commit themselves to great investments (a question averages $ 35); the telephone is at hand to make contact with the DATUM bureau - a province-wide computer-based legal information service has been established.

Actually DATUM is now planning to go on-line. This will mainly facilitate the job of the DATUM consultant, who will not have to rely on batch runs, cfr. Tapper 1974:13, but bigger law firms may also be expected to acquire their own terminals, cfr. Mackaay 1973:106.

DATUM has recently been reorganized. A Quebec Information Council has been created, which is to be responsible for the integration of the various legal information services in the province. Professor Ejan Mackaay will be the first director of this council. It is expected that the DATUM and MODUL projects will be merged (MODUL is the legislative information system developed by Université Laval), as well as the various series of Quebec law reports and the Minibiblex microform service (which is already being employed to a great extent by DATUM

[Page 121 ]


consultants). Actually, DATUM has also for some time marketed hard copy printouts in selected areas of law, 23 such packages are being sold in editions of 50 - 200 copies ( $ 25-75 per copy), cfr. Tapper 1974:12.

We see once more how a project has grown from an initial limited effort in documenting case law to a province-wide legal information system, in which computer-based and conventional services are viewed as pieces of the same jigsaw puzzle. This is partly due to the effect of introducing a new technology: the basic design of a legal information system has been scrutinized, and flaws have been uncovered which are independent of the technology employed.

6.3 QL-SYSTEMS

Since 1961, the Queen's University, Kingston, has been engaged in a Treaty Project, collecting and annotating all of the treaties of the British Commonwealth. About 18 000 (1970) detailed treaty records have been prepared. Since 1967 computerized text-editing has been used to add information from these records. The Treaty Project has become a major activity, as treaty registers for a number of developing countries were prepared from these records - cfr. Lawford 1968, Lawford/Latta/von Briesen 1970:3, Tapper 1973:279-80.

The enthusiasm of Professor Hugh Lawford was a moving force behind the Treaty Project. In 1968 he initiated another project to become known as QUIC/LAW - acronym for "Queen's University Institute for Computing and Law". Following an exchange of letters in late 1968, IBM Canada and Queen's University launched a study of potential applications of computer-based systems for legal information retrieval.

A production organization for converting legal sources into machinereadable form was established, and the conversion took place at a high rate. A number of contacts were made with other universities, publishers, etc. to secure material, as QUIC/LAW aimed at growing into a nationwide legal research service.

In 1972 QUIC/LAW was set up as a commercial system to market its services to lawyers. And the retrieval system offered was of a somewhat different basic design than the other systems described, which to a great extent have relied on Boolean operators. QUIC/LAW decided to base their system on ranking algorithms.

To start with, QUIC/LAW was attracted to the IBM system IN-FORM/360, an unreleased program developed for IBM internal

[Page 122 ]


corporate headquarters' use in Armonk, New York, cfr. Slayton 1974:10, Lawford/Latta/von Briesen 1970:3. Several difficulties were encountered at the adaptation of INFORM/360 for QUIC/LAW use, mostly in connection with operating system incompatibility. Though redesigned, the basic characteristics of this program have been retained.

Ranking algorithms are based on the assumption that the number of word-matches between a query and a document is an indication of relevance. A simple algorithm would be to just count the number of words included in the query which also occurred in the document. The document scoring most "hits" would be ranked first, and so on down to the last document, the lowest rank possible being assigned to a document having just one word from the query occurring once, cfr. below in section 10.5.5.

It is easy to see that such a simple word-frequency ranking algorithm would hardly be adequate, since a long document - including a great total number of words - would have a greater probability of being ranked first than a shorter document, this being due to its greater length, not to a corresponding greater probability of being relevant. Much more sophisticated algorithms are therefore developed, taking into account the length of the document, the frequency of occurrencies within the document compared to the frequency of occurrencies within the whole data base etc. Altogether eleven alternative ranking algorithms are available in QUIC/LAW.

It will be noted that the ranking algorithms make feasible what may be called a natural-language-based retrieval strategy: the user is not obliged to structure his query according to a set of norms, but may formulate it as he would if he were jotting it down on paper for a colleague. This certainly simplifies user interface, but only at the cost of reduced performance - as the information inherent in the structure would not be available for the improvement of retrieval performance (cfr. Bing/Harvold/Kjønstad/Stabell 1976 for a discussion of natural-languagebased retrieval strategies), and below in section 10.5.1.

Boolean operators are employed in QUIC/LAW as restrictions. Query words may be combined by Boolean operators in order to specify that documents not including a combination of certain words (for instance where only A, not B occurs) should not be considered retrieved and included in the ranked array of documents. Positional operators are not available however - there being no guarantee that two words combined by and have a corresponding semantic relationship in the document. (This

[Page 123 ]


is a risk which may be reduced, but not eliminated, by positional operators.)

For examples of searches conducted on QUIC/LAW, see Slayton 1974:16- 17, printouts 33-35, and Lawford 1973:72-93 with rather detailed examples of retrieval strategies and output formats. The QUIC/LAW system is terminal oriented and conversional, response-time being maximum 30 seconds and average 10 seconds, cfr. Lawford 1973:70.

The retrieval capabilities offered by ranking algorithms based on word frequency are discussed elsewhere in this book, cfr. below at section 10.5.5. Several systems do in fact include ranking algorithms, the IBM system STAIRS - used by a number of European legal information retrieval systems - has for instance five optional algorithms, which may be replaced by algorithms of the user's own desing. We may also point out that a system based on retrieval by word-frequency ranking algorithms has much in common with a system based on vector retrieval, cfr. below at section 10.5.5, as for instance the Swiss system CONTEXT 70, cfr. below in section 7.6.

As mentioned, the QUIC/LAW project was conceived as the start of a national Canadian legal information system. But only a year after it was set up as a commercial enterprise, in March 1973, IBM and Queen's University withdrew their support. The Canadian Ministry of Justice supported the system for a three months test to measure the usage of the system by lawyers and to assess the feasibility of an expanded QUIC/LAW service, operated either by the Department or a private company.

The test included 16 terminals, located in law firms and public administration agencies (including the Ministry of Justice) in Ottawa and Toronto. The available data base was composed of Supreme Court Reports from 1923 (145.000.000 characters), Federal Court Reports (6.000.000 characters). Revised Statutes of Canada (33.350.000 characters), Federal Statutory Orders and Regulations (28.000.000 characters), and 28 000 bibliographical index records of the Queen's University Treaty Project. (Cfr. QUIC/LAW 1973.)

The final report of the test is in its own right a valuable product, but the Ministry of Justice proved unwilling to prolong its support of the QUIC/LAW project. Rights in the project were transferred to a new company - QL-systems - set up in Ottawa by the staff that had been working on the project, principally Professors Lawford and von Briesen (cfr. Tapper 1974:2).

QL-systems have entered into cooperation with West Publishing Company (cfr. Tapper 1974:3-4, 1976:9), and have had changes made in the retrieval programs. Such changes include full positional logic, capability

[Page 124 ]


of using phrases as query terms, capability of storing queries, etc. A number of these improvements are results of the user experiment, cfr. QUIC/LAW 1973:8-9.

In 1975 West announced its Computer Law Retrieval Service (cfr. Ginnow 1975), launched under the name of WESTLAW (cfr. Tapper 1976:9). The data base of this service is the headnotes of cases, as reported in West publications. It is argued that this gives better retrieval capabilities and facilitates the relevance assessment of the user. Through West, QL-systems has become a competitor on the commercial market of the United States. It is also West's way of celebrating the completion of its first century of service to the American legal profession.

But on this market there is already one very successful commercial legal research service. In the following section we shall describe this service, and also make a few comments on the competitive situation created by West's move.

We have earlier discussed West's confrontation with another commercial venture, the Law Research Service, which during its bankruptcy proceedings entered into an agreement with West, referred to in section 4.4.2.

6.4 LEXIS2

2 We are indebted to Jerome S. Rubin, President of MDC, and Ms. Janice C. Teisberg of the MDC for comments on this section.

In 1964 the Ohio Bar Association created a working group for evaluating the possibility of employing computers in legal research. This working group concluded in 1967 that no satisfactory system existed, and that Ohio lawyers probably would have to create a new system. A subsidiary of the Ohio State Bar Association was founded as a non-profit organisation under the name of OBAR, an acronym for Ohio Bar Automated Research Corporation (cfr. Asman 1973, Harrington/Wilson/Bennet 1971:184). OBAR signed a contract with Data Corporation of Dayton, Ohio, which had developed a system in 1964 for the retrieval of Air Force reconnaissance documents (cfr. Rubin 1973:2). This retrieval system was to be developed further until it was found satisfactory in relation to OBAR's standards.

In the summer 1968, Data Corporation was acquired by the Mead Corporation - a multi-national, diversified corporation with a very solid economy. Mead soon thereafter committed itself to underwrite computer-based legal information systems. To manage the project, the

[Page 125 ]


Information Systems Division of Data Corporation was incorporated as a wholly-owned subsidiary, Mead Data Central, Inc.

In 1969 a prototype of the system became operational, and 15 teletype terminals were placed in lawyers' offices. MCD conducted intensive studies of how users exploited the system - and on the basis of the "Ohio experiment "it was concluded that the system introduced by Mead and OBAR was economically feasible and an effective approach to computer-based legal research. The system was a conversational full-text system with Boolean and positional operators.

The teletype terminals crippled the first prototype of the system. In 1970 MCD replaced these with video display terminals, accompanied by hard copy printers, and with additional functional features, including effective browsing capabilities, cfr. Rubin 1973:20. At one time, the video terminal actually used colors for highlighting query terms displayed in the text (cfr. Tapper 1973:191), but the version operational at present highlights such terms by displaying them as black letters on a field of white. For a description of this version, see Lancaster/Fayen 1973:37-38, cfr. 460-466 for search instruction, and 76- 77 for characteristics.

In 1971, MDC acknowledged that a national, full-text interactive service should be the goal, cfr. Rubin/Krebs/Woodard 1973:16.24. It was stipulated, however, that the introduction of such a system should only take place by participation from the organized bar, which was all the more natural as the system had its origin in the bar and MCD had been working in close understanding with OBAR all the time.

The system, which had become informally known as the OBAR system, was now redesigned - and at the beginning of 1973 it was launched as a national computer-based legal research service under the name LEXIS. Since then it has grown very rapidly. As of May, 1976, there are more than a hundred terminals in daily use. Nearly 14 000 lawyers, judges, and tax accountants, as well as over 6 000 law students, have been trained to use LEXIS. MDC has subscribers in 20 states throughout the United States. The subscribers include law firms, accounting firms, state government agencies, bar associations, law schools, and such federal agencies as the Internal Revenue Service, the Securities and Exchange Commission, the Federal Trade Commission, the Supreme Court, the Tax Court, and several United States Courts of Appeals. From this it will be seen that LEXIS is trespassing on ground partly covered by the other systems RIRA and JURIS - though the data bases make these services supplementary rather than competitive.

[Page 126 ]


Three features of the LEXIS system should be pointed out: the emphasis put on hardware and reliability, the philosophy behind the retrieval programs, and the way the challenge of data base coverage has been tackled.

LEXIS operates out of a dedicated computer installation in Dayton, Ohio (an IBM 370/155 with another IBM 370/155 as backup). All the software has been written by MCD - including the terminal handling programs and modifications in the operating system. In this way LEXIS has got what is said to be the most reliable IBM computer installation operative today. The system is available through video terminals in the office of the lawyer. The terminals are especially designed for LEXIS, and feature a keyboard with a number of non-standard function keys, and a hardcopy-printer, cfr. Rubin 1973:24. They are also designed so as to reassure the user that no technical know-how is necessary for using the equipment.

Response time is very short - for more than 90 per cent of all search requests less than 15 seconds, for all other commands well over 98 per cent 1 second or less, cfr. Rubin 1973:31. Performance specifications, including response times, are guaranteed in MCD's sponsoring organization contracts, which are incorporated in the agreements with subscribers.

The system is fast and it is reliable. It should be recognized that this basic reliability is only achieved through major investments, and that it only indirectly affects the legal retrieval system. Actually this is a condition which a commercial system must meet if customers are to rely on its services.

The second feature is the philosophy behind the retrieval system itself. LEXIS has been characterized as a "computerized reading glass", and this is in many respects a good summing up of the basic design. Great care has been taken to provide fast access to retrieved documents, to facilitate browsing, etc. A KWIC format is for instance available - that part of the retrieved document which contains the terms of the query is presented on the screen with the terms highlighted. This has become standard on some other systems (for instance the Swedish IMDOC), and obviously meets a need among lawyers, who found relevance assessment much faster. A fast system with good browsing facilities makes it possible to flip forward and backward in a file of retrieved documents - an improvement of the way in which a lawyer will thumb through cases in a conventional library (cfr. Rubin 1973:31).

[Page 127 ]


The KWIC format seems to have aroused enthusiasm among lawyers. One of the special uses to which this format is put, is the listing of citations. Using a citation as query, all documents citing the case will be retrieved and the bit of text embedding the citation may be printed out. Preston (1971:191) points out with gusto that what earlier would require "lawyer time" may now "be done by a girl at the console" (!) - coining the term "obarizing" for this retrieval strategy. (The strategy is now informally called "LEXISing" - and MDC stresses that the lawyer himself can best determine the usefulness of the citations presented on the video display.)

Queries are formulated in a conventional manner using Boolean and positional operators. A query may be formulated in a series of "levels", and earlier levels may be amended to modify a query. We might have expected to see sophisticated ranking algorithms and other exotic retrieval strategies available in a system of this scope. MCD has, however, chosen a rather simple user interface, finding that most searches are simple, and that the emphasis rather should be on the "reading glass" facilities. The claim that in a fast system a user can afford to flip through a few irrelevant documents is not unreasonable, and at the same time it is pointed out that the final relevance assessment always has to be the user's. Though excellent arguments are on the side of MDC, one may all the same expect to see improvements also in this respect.

The education of users and co-operation with the bar association may also be mentioned here. It is recognized that the education of a user is crucial and difficult - MDC has educational programs that are sold along with the terminal equipment in order to give the user a headstart. And through the co-operation with bar associations, MDC has access to user opinion and experience. This continuous feedback has certainly been one of the factors determining which part of the system should be given the highest priority.

Apart from the co-operation with OBAR, contracts are entered into with the National Centre for Automated Information Retrieval (NCAIR), the successor of Lawyers' Centre for Electronic Legal Research, cfr. Flavin 1973; MOBAR, and TexLex (the sponsoring affiliates of the Missouri and Texas bars), cfr. Rubin 1973:12, and Harrington 1974 on NCAIR's involvement. MDC had by 1976 also sponsorship arrangements with IBAR, a joint subsidiary of the Illinois and Chicago Bars; and the American Institute of Certified Public Accountants (AICPA).

No objective evaluation of the LEXIS system has been published, but even experienced experts like Tapper (1974:16) have been impressed. Of the Ohio lawyers using the system, 78 per cent reported that it was at least

[Page 128 ]


5 times as fast as conventional research, and 41 per cent found it at least 13 times as fast. 94 per cent found that the results were at least as comprehensive as in conventional research, 68 per cent found it more comprehensive, and 28 per cent claimed that the result was much more comprehensive - cfr. Rubin 1973:8.

The data base of LEXIS is constantly growing. Our latest report (Musselman 1976) states that there are 4 federal and 6 state libraries. The federal libraries include

  • a general federal library, consisting of all reported decisions of the Supreme Court of the United States from 1938 to the present, the Court of Appeals from 1959 to the present, the District Courts from 1970 to the present, and the United States Code.
  • A federal tax library
  • A federal securities library
  • A federal trade regulation library

The state libraries include the statutory and case law of Illinois, New York, Ohio, Missouri, Pennsylvania, Texas and Kansas. Libraries for California, Massachusetts and Delaware corporation law are being established, cfr. Rubin 1976:5.

Altogether this amounted to a data base exceeding 6 000.000 000 characters, making the LEXIS data base the largest full-text legal data base at present.

MDC has made a considerable effort to create a convenient structure in the data base. The data base is segmented into libraries, corresponding to states or areas of federal law. Each library is divided into files - this structure being determined in cooperation with the users.

For instance the Ohio library contains the following files:

  • Ohio Revised Code
  • Constitution of Ohio
  • Ohio State Reports
  • Ohio Appellate Reports
  • Ohio Miscellaneous Reports

    A file is composed of documents, each document being further divided into segments. The segments are naturally occurring divisions within the document - for instance the name, the date, the citation, the majority opinion, the dissenting opinion, and the names of the judges. Segments may be searched and displayed separately from the documents containing them, cfr. Rubin 1973:15-19.

    [Page 129 ]


    Data capture on such a scale as required by MDC, is - of course - a major operation. An accuracy standard of 99.99 per cent is imposed on the material. This is achieved by double registration and automated error checks. For retrospective documentation, MDC employs sources of labor overseas (cfr. Tapper 1974:15), for new material MDC encourages sponsoring professional organizations to find solutions for capturing data at the source (cfr. Rubin 1973:20).

    The cost of the system is a sum of several factors. Pursuant to its contracts with private and most government LEXIS subscribers, MDC charges nonrecurring fees for terminal installation and user training, monthly fees for equipment and communications, hourly fees for use of the service, and incidental fees for off-line printing. Since 1974, LEXIS services may be obtained with no minimum use commitment.

    After this brief description of the LEXIS system, let us return to the basic characteristics. LEXIS has emphasized a reliable service with short response time and a high coverage. The custom-designed terminal facilitates browsing functions, which also exploit the fast reponse time of the system. The retrieval language is, in contrast to the sophistication demonstrated in other systems, rather simple.

    The change in emphasis distinguishing LEXIS from many other systems may be due to the difference in approach between an operative commercial system, and systems that have their origin in university environments, or are run on the enthusiasm of the staff, which is aroused by a curiosity directed toward alternative methods for improving system performance through novel retrieval strategies.

    As we shall discuss in further detail in part IV, one of the basic quality standards of an information system is the coverage. It seems to be a sound judgement of MDC to invest in coverage.

    As mentioned above in section 6.3, West has announced its WEST-LAW. WESTLAW will certainly become LEXIS' major competitor. When placing the two systems in a competitive perspective, however, one should also notice that in research in Federal cases generally, LEXIS represents the first viable alternative to West's Federal Reporter System (Rubin 1976:8).

    There are a number of differences between WESTLAW and LEXIS. The retrieval strategies available at WESTLAW (incorporating the original QL-systems options) will probably be somewhat more diversified. Little is known about the comparative reliability. West may, through integration with their publishing service, also capture the data at the source at small extra costs. A first comparative study is Sprowl 1976.

    [Page 130 ]


    The major difference will be, however, that the LEXIS data base includes case law in full text, while WESTLAW's will only include headnotes. The discussion of which is the better alternative is not to be avoided. This discussion will cover a number of issues: which of the two provides the better retrieval capabilities of a system, how much added inconvenience will the necessity of looking up the actual text of possible relevant cases represent to the users, and the deficiencies implied by lack of objectivity in abstracts (cfr. below at section 12.5.3). West may rely on the legal tradition, as lawyers are used to the indexes and methods applied by this major legal publishing house. On the basis of available experimental results it would appear rather surprising, however, if a data base of headnotes proved to give the system better retrieval capabilities than a full-text data base.

    There will probably be an interesting confrontation on the United States market, where the advantages and disadvantages of the two systems will be disclosed - not only in regard to system performance, but also with respect to other factors equally important to a commercial service, such as for instance reliability and cost.

    [Page 131 ]


    7 Five reference retrieval systems

    7.1 INTRODUCTION

    Below, we shall give a very brief description of five different reference retrieval systems, all of which are commercially available. The descriptions are partly supplementary to the historical survey above, as some of the systems are used in several of the projects mentioned. But also this brief description may serve as an illustration, highlighting some points of difference in design between retrieval languages. This will, we hope, especially be of some use to the reader not already familiar with this type of systems.

    7.2 STAIRS1

    1 We are indebted to Mr. Ole-Jørn Bryn of IBM-Norway for comments on this section.

    7.2.1 Characteristics

    STAIRS is an acronym for Storage and Information Retrieval System, and is a general purpose retrieval system developed by IBM. The system operates on IBM 360 or 370 computers under Customer Information Control System (CICS), or Information Management System (IMS).

    STAIRS is a terminal-oriented interactive system; retrieval is usually conducted on an IBM video display terminal which has function keys for the most used functions (like .. SEARCH,.. SELECT,.. BROWSE,.. RANK, etc.). The user constructs his query, receives a response from the system, and may modify his query, utilizing the feedback information.

    STAIRS may be used on any type of stored texts, but does encourage a full-text approach, as every word of the stored documents may be used in a query as search criteria. Excepted from this may be a list of stop-words predefined by the user.

    STAIRS offers several optional retrieval strategies to the user. In a .. SEARCH query words may be combined by Boolean operators (AND,

    [Page 132]


    OR, NOT, XOR). Special operators may be used for specifying that two words should occur in the same paragraph (SAME), the same sentence (WITH), or adjacent (ADJ); these operators providing a limited positional logic.

    Every query is given a number by the system. These numbers may be included in new queries, providing the facility of using queries as building blocks.

    .. SELECT queries provide retrieval on predefined formatted fields associated with the documents. These fields may contain numerical or textual information, and special operators will relate query information to such fields (for instance "greater than", "within limits", "equal to", etc.) .. RANK-queries give the user access to five optional ranking algorithms based on word frequency (see Bing/Harvold/Kjønstad/Stabell 1976 for an assessment of these algorithms).

    The retrieved documents are displayed on the terminal screen by using the .. BROWSE-function. Several sub-commands of .. BROWSE are available: the whole document may be displayed, or only certain segments. .. BROWSE also includes commands for having hardcopy produced locally or centrally. Function keys facilitate paging. Terms included in the query are highlighted on the screen when displayed as part of a document.

    A general description of STAIRS is available from IBM - for description in the literature of legal informatics, see Furth 1973 (English), or Fietzek 1974 (German).

    STAIRS is used in several operative legal information systems, for instance by CEDIJ (above in section 5.3.2), CELEX (above in section 5.1.3), and DATEV (above in section 5.6.3). It has also been employed in a user experiment at the Norwegian Social Security Court (cfr. Brukerforsøk 1975). The Australian government has opted for STAIRS on the basis of the report of the Committee on Computerization of Legal Data (1974).

    In order to make STAIRS even more attractive to legal information systems, IBM-Austria has - in cooperation with the Austrian government - developed an additional program called FAIR (an acronym for Fully Automated Information Retrieval). This program has four major functions:

    • it provides the user with an automatic generation of grammatical variations (Flexionsformengenerierung),
    • it contains a conceptual thesaurus (structures and maintenance

    [Page 133]


    routines which facilitate the creation of a thesaurus defined by the user himself),
    • a composite word thesaurus (a set of tables generated manually - in German composite words are created by merging different words into a longer word in contrast to English, where they are constructed with vehicular words like "of and "by"),
    • and facilities for retrospective retrieval, i.e. retrieval of, for instance, legislation repealed by later amendments.

    Some of the results of the FAIR project have been documented in Lang/Bock 1973. In spite of this intriguing start, Austria does not seem to have opted for a continuation of the project.

    7.2.2 An example

    As mentioned above, STAIRS offers several options to the user. In this example, we shall describe a simple .. SEARCH query. We presuppose that the user has access to STAIRS at his terminal, and will select the part of the program supporting the .. SEARCH mode. He is rewarded with the screen image:

    AQUARIUS - SEARCH MODE - BEGIN YOUR QUERY AFTER THE STATEMENT NUMBER

    00001

    Let us imagine that our user is interested in cases on car accidents. He then enters the two simple words combined by the Boolean operator AND, prescribing that he will retrieve only documents containing both words:

    AQUARIUS - SEARCH MODE - BEGIN YOUR QUERY AFTER THE STATEMENT NUMBER

    00001 CAR$ AND ACCIDENT$

    In this query, the character "$" is used to prescribe right hand truncation - the system will accept as equal all words commencing with the three letters "car" and with the eight letters "accident". As response the system gives a word statistics (which the user through an earlier command may have selected to skip), showing the words as they occur in the data base:

    PAGE = 1 OF 2
    CAR$ 879 OCCURRENCES
    CAR 456 OCCURRENCES 133 DOCUMENTS
    CARS 324 OCCURRENCES 76 DOCUMENTS
    CAREFUL 99 OCCURRENCES 6 DOCUMENTS

    [Page 134]


    This word statistics may fill several screen images, flipping through them (in the example, the indicator at the top of the screen image shows that there are only two screen images filled or partly filled) by using the ENTER-key, the user arrives at the last. Pressing the ENTER-key once more, he is given result.

    AQUARIUS - SEARCH MODE

    00001 CAR$ AND ACCIDENT$
    RESULT 1023 OCCURRENCES 42 DOCUMENTS

    At this point, the user can select to go into .. BROWSE mode, which will allow him to have the text of the retrieved documents displayed. Or he may select to refine his query. In our example, a simple refinement would be to exclude the word "careful" in the query through the use of the Boolean operator NOT.

    7.3 IMDOC2

    2 We are indebted to Mr. Bengt Broomé of Datecon AB (Kungsgatan 8, S-111 43 Stockholm, Sweden) for comments on this section.

    7.3.1 Characteristics

    IMDOC is a family of retrieval systems developed by the Swedish company Industri-matematik AB. It is implemented on several computers, including an IBM version, a UNIVAC version, and a minicomputer version (which may support 16 terminals). IMDOC has been under continuous development for seven years, and is operational in several generations. It has possibilities of intermediate storage of new information on a "hot-file", it offers text-handling functions for on-line text editing and has several options like on-line sort, document segmentation, phonetical search, etc.

    IMDOC is a terminal-oriented interactive system; retrieval is usually conducted on a video display terminal. The system is re-entrant, and operates under IBM OS and DOS with most TP-monitors (CICS, IMS/DC, TCAM, TSO, etc.). IMDOC is based on a full-text philosophy - every word of the stored text may be used in a query as search criteria, with the exception of a predefined stop list. In contrast to STAIRS, IMDOC lacks the possibility of having fixed fields attached to the documents - a lack that is partly remedied by the use of defined prefixes, cfr. above in section 5.5, with the arithmetic search feature (greater than, equal to, less than).

    Queries are formulated in a sequential manner. Search words are given by the user, and for each level the system gives the user response. The user can elaborate the query further, or have the text of the retrieved documents displayed. Several output formats are available, among them a

    [Page 135 ]


    KWIC format similar to the one mentioned in respect of LEXIS system (above in section 6.4), though query terms are not highlighted by a white field, but rather by an asterisk inserted in front of the term. Ranking functions are available in later generations.

    IMDOC is used in the Swedish RI system, and is also used in the Finnish legal information system which is rather similar to the Swedish. IMDOC has here found its main use in relation to precedents of the Supreme Administrative Court, cfr. Hallberg 1972.

    IMDOC is a simple system both in design and use. Emphasis has been put on effective updating routines (updating on-line) and compact storage. Alpsten 1975:10 reports that the overhead in storage varies between 21 and 34 per cent in the data bases currently maintained, and response time is less than 5 seconds.

    Literature on IMDOC is available from Industri-Matematik - a short description is also given by Arvén/Henriksson/Leimdörfer 1970, and in English by Leimdörfer 1973.

    7.3.2 An example

    In contrast to the several options offered by STAIRS, the user in IMDOC does not have any choice - he has to use the retrieval strategy corresponding to a .. SEARCH query in STAIRS. We presuppose that the user has gained access to the system, and has selected a data base. The system responds by showing a blank screen image except for some special characters used for certain messages:

    If the user is still interested in cases on car accidents, he will enter this query on the screen. As IMDOC - in contrast to STAIRS - in this respect is command-driven, the user writes:

    QUE car*, accident*

    QUE is the operator indicating the opening of a question. Comma is one of several alternate ways of expressing the logical operator and. In this example, the user will get a response from the system telling how many documents contain the word "car" and "accident" (truncated to include all words beginning with those letters):

    QUE car*, accident*

    42

    At this point, the user can go on modifying the query, or he may break off, displaying the retrieved documents.

    [Page 136 ]


    7.4 GOLEM

    7.4.1 Characteristics

    GOLEM is an acronym for Grosspeicherorientierte Listenorganisierte Ermittlungsmethode, and is a general purpose information retrieval system developed by Siemens. The version described below is GOLEM 2, which operates on Simens DVA 4004/135-265.

    GOLEM is a terminal-oriented interactive system; retrieval is usually conducted on a video display terminal. The user constructs his query, and may reformulate it on the basis of feedback information.

    GOLEM may be used on any type of stored texts, but does prefer a fulltext approach, as every word of the stored documents may be used in a query as search criteria. Excepted from this is a list of stop-words predefined by the user.

    GOLEM may be integrated with PASSAT (Programm zur atomatischen Selektion von Stichwörtern aus Texten), which conducts a preanalysis of input texts and creates a structured thesaurus where word forms are referred back to basic forms, with synonym- and associative networks, etc.

    GOLEM will resolve the homonym problem by having "aspects" associated with words, the "aspect" illuminating in which sense the homonym occurs in the text. Documents may be structured or unstructured: in a structured document GOLEM permits retrieval on predefined fields.

    Queries may be constructed with Boolean operators in a two-step operation. There are possibilities of using a kind of positional logic (Indizierung), and there are simple procedures for look-up in the thesaurus.

    A general description of GOLEM 2 may be found in Gosholz/Urbach, cfr. also Hahn 1972.

    GOLEM is used in the JURIS (Germany) "development system" in Bundessozialgericht, and has been selected as the basic retrieval system for the future development of JURIS (cfr. above in section 5.6.4-5). Siemens is, however, working on a new retrieval system, provisionally known as CONDOR (Communication in Natürlicher Sprache mit Dialog-Orientierten Retrieval-Systemen), which may replace GOLEM.

    7.4.2 An example

    In this example, we will describe a simple SUCHEN procedure. We presuppose that the user

    [Page 137 ]


    has access to GOLEM at his terminal, and that he has selected a data base. The system then inquires what sort of procedure he wants to use, and the user responds with:

    SUCHEN

    The system comes back with:

    DESKRIPTOREN EINGEBEN

    Our user is still occupied with problems of car accidents, and formulates this on the screen:

    DESKRIPTOREN EINGEBEN

    CAR

    + ACCIDENT

    On this basis, the system gives information on the number of documents containing the specified words:

    DESKRIPTORENLISTE

    1. CAR (764)
    2. ACCIDENT (342)

    ENDE DESKRIPTORENLISTE

    NAECHSTE ANWEISUNG

    The user now combines the two words with the Boolean operator and which in GOLEM is represented by the letter "U" for "und":

    1U2

    To this the system responds with the number of documents satisfying the query:

    ANZAHL ZIELINFORMATIONEN: 24

    NAECHSTE ANWEISUNG

    At this point, the user may look up the retrieved documents, or he may go on refining his query.

    7.5 STATUS3

    3 We are indebted to Mr. Norman Price of the Atomic Energy Research Establishment, Harwell, for comments on this section.

    7.5.1 Characteristics

    As described above in section 5.7.1, STATUS is a system developed by the British Atomic Energy Authority. STATUS is an acronym for "Statute Search ", the system originally having been designed for retrieval of statute law. It has proved, however, to be a general purpose retrieval system. The system operates on a number of different computers, cfr. above in section 5.7.1, and is mainly written in FORTRAN.

    [Page 138 ]


    STATUS is a terminal-oriented interactive system, retrieval is usually conducted on video display terminals. STATUS may be used on any type of stored text, but does encourage a full-text approach, as every word of the stored documents - excluding a list of predefined stop-words - may be used as search criteria.

    Retrieval is mainly conducted through construction of a query with Boolean operators (.and., .or., .not.). Special positional operators may be used for defining the maximum distance (measured in number of words) which is acceptable between words used in the query.

    A special feature is the .macro.-function, which will allow the user to define a query as a .macro.. The query is then stored under the name given to the .macro., and may be included in later queries as a single term. Also, .macro.es may include variables to be set by parameters defined in the new queries. Through the use of the .macro.function, the lack of thesaurus facilities is to some extent remedied, and the user has a powerful tool for developing refined queries for standard questions.

    A number of alternative output formats are available.

    STATUS has been used for experimental legal retrieval at Harwell, and is used by the Council of Europe for their European treaties (cfr. above at section 5.7.1). Norway has acquired STATUS for use within the public administration, and has since summer 1975 been modifying it to some extent. Conceptor-based retrieval is now available in the Norwegian version, as well as storing of queries, and better facilities for using a building block strategy through the numbering of queries for later reference. The Norwegian version is known as NOVA*STATUS, an acronym for Norsk versjon av STATUS.

    A revised version of STATUS - STATUS II - has recently been released by Harwell. This version has an improved file structure for on-line updating etc., so the data base design is more complex, but the search technique is the same - cfr. Price 1975, Introduction 1975.

    7.5.2 An example

    We presuppose that our user has gained access to the system, and has selected his data base. STATUS then prompts him with the message:

    QUESTION PLEASE

    Our user then formulates the query on car accidents in the way which has by now become familiar:

    [Page 139 ]


    QUESTION PLEASE

    >

    car 4 + accident*?

    The system responds by giving the number of documents containing both words:

    QUESTION IS SATISFIED BY 42 DOCUMENTS

    If the number is less than 8, the titles of the documents will automatically be listed out. If not, the system inquires if the user wants the list - and from there on, the user may gain access to the text. He may also, of course, choose to refine his query.

    7.6 CONTEXT4

    4 We are indebted to Mr. Bernhard Vischer of DATA + PLUS for his comments on this section.

    7.6.1 Characteristics

    CONTEXT is a text retrieval system developed by the Swiss firm UNIDATA AG, which was founded in 1968 by Swiss lawyers. It should not be confused with the company formed by several European computer manufacturers under the same name at a later date. CONTEXT is now being marketed by DATA + PLUS, Meisenweg, 9, CH-8038 Zurich, Switzerland.

    Designed in 1968-1970, CONTEXT has been presented at various conferences in Switzerland, Germany, Italy, and Holland since 1970. The company Juristische Datenbank AG has been founded to operate the system.

    CONTEXT is a priority-oriented remote batch system. It is written in FORTRAN, and has to be interfaced to a data base management system (for instance TOTAL or System 2000).

    The main point of interest is the basic philosophy of the system, which is based on vector retrieval and normal language queries.

    The query can be of unlimited length, and the user emphasises important words, numbers, etc. by repeating them one or more times. As query, the user may also quote a document which is part of the document collection. This document will then be taken as the query.

    The programs allow for the creation of a synonym thesaurus, and if a thesaurus exists, the query is expanded with the words defined as synonymous to those used in the query formulation.

    CONTEXT retrieves documents by what the designers term "similarity

    [Page 140 ]


    measures". Their hypothesis is the proposition that with increasing "similarity measure" the relevance probability also increases. As "similarity measures "the "overlap "and "closeness measure "is employed. The "overlap measure" indicates that query and retrieved document must "overlap" by at least one word - i.e. all documents containing at least one word identical to a word from the query are considered retrieved. (Common words are excluded by an "empty word list", the present system uses about 200 such words.)

    The "closeness measure" is based on vectors, cfr. below at section 10.5.5 (2). The vector is created from the query and compared to the vectors representing the documents considered retrieved through the "overlap measure". For comparison, the cosine function is employed. The resulting ranked set of documents is then presented to the user by a "scope" - a report listing documents in order of decreasing "similarity measure".

    Vector-based retrieval and the resulting ranked set of documents are related to the more conventional ranking algorithms based on word frequency - cfr. STAIRS (above at section 7.2.1) and QL-systems (above at section 6.3). The basic approch is, however, quite different.

    Queries in CONTEXT may very well be lengthy. As provided for by the quoting facility, a document may be qualified as a query, and similar documents may be retrieved. The facility of using a known relevant document - retrieved for instance through a first, provisional query - gives CONTEXT a dynamic quality stressed by its designers.

    The CONTEXT system has been used for experiments on a pilot data base of 2 000 decisions in Swiss civil law, and for a comparative experiment where the performances of CREDOC and CONTEXT were compared, rather to the latter's advantage (Prestel 1971b). A description may be found in Vischer 1971, cfr. also CONTEXT 70 (1971) and CONTEXT (1972). A recent description in English is given in DATA + PLUS 100 (1976).

    7.6.2 An example

    We presuppose that our user has gained access to the system, and has selected the data base (private law). He is still concerned about car accidents, and is invited by the system to describe his problem. (The example is reconstructed on the basis of CONTEXT 70 (1971) and Vischer (1971).)

    display problem

    000100-Cases concerning car accidents?

    [Page 141 ]


    This query is expanded, and the system will return with a listing of occurrences:

    Private law Status 31.12.1975

    Vokabularium
    ************
    Deskriptoren Frequenzen
    (F=Fuellwort)
    Dokumente
    (Mengen)
    Cases F
    concerning 1258 674
    car 851 267
    accidents 401 137
    Menge der aehnlichen Dokumente
    ******************************
    (1-te Naeherung)
    =51

    The user now knows that there are a total number of 51 documents containing at least one of the three words that were accepted as non-stopwords. Next he has the system ranking the documents, and is given the response:
    Private law Status 31.12.1975
    Aehnliche Dokumente
    *******************
    (Ordnungskriterium: Menge der Deskriptoren)
    Menge der Deskriptoren zum Problem = 3
    (1-te Naehrung)
    Rang: 1
    Mass: 3
    Deskriptoren: concerning
    car
    accidents
    Dokument: 36 (intern)
    Abschnitt: Quelle
    Wortlaut:
    1 Smith vs. Tip Top Taxi Co.
    2 99 Imaginary Reporter 66

    At this point, our user may have a look at the document ranked first, or he may determine to go to the next document in the array - or he may refine his query. This he may do through a novel and more extensive query (our example obviously does not adequately demonstrate the advantages of normal language queries). A reformulation may also take the form of assigning weights to the terms in the query by repeating them.

    [Page 142 ]


    HOME
    PREVIOUS PAGE
    NEXT PAGE