By Linda Beebe
The First 1 Million Records
Although the history of abstracting psychology literature began in the late 19th century, the first million records in PsycINFO included only literature from the 1960s forward. It took nearly 30 years, from 1967 to 1995, for APA to produce the first 1 million records.
The Evolution to Electronic Retrieval
PsycINFO began in 1967 with the first electronic publication of the bibliographic records included in that year's print Psychological Abstracts. The ability to produce an electronic product so early in the computing evolution came about as a result of grants from the National Science Foundation (NSF) for a Scientific Information Exchange Project.
In 1965 the APA Publications Board approved an experimental study of the feasibility of producing Psychological Abstracts by the Photon process, which would yield magnetic tapes that could be used for information retrieval.
The production process was crude by today's standards, as the electronic output was the result of a long paper-and-pencil creation process. However, when implemented in 1966, it greatly changed nearly everything about the production of Psychological Abstracts. APA increased staff from 5 to 12, and the number of journals covered more than doubled.
With a monthly, rather than a bimonthly, publication schedule, lag times were cut dramatically from as much as 3 years to as little as 3 months. The quantity of abstracts published also increased, moving from 8,381 in 1963 to 13,622 in 1966; and by the end of the decade the annual output had risen to 18,068.
The National Clearinghouse of Mental Health Information of the National Institute of Mental Health began purchasing abstracts on magnetic tape with the 1967 volume of Psychological Abstracts. Although the start-up was initially slow, the evolution from print to electronic retrieval was underway.
Between 1968 and 1972, NSF support for the National Information System for Psychology (NISP) enabled APA to install in-house keyboarding facilities, create standardized input and bibliographic control, and develop a psychology thesaurus.
Once the capability to offer electronic retrieval services was in place, APA initiated several delivery services.
In 1971 APA began offering 3 services:
- PA Direct Access Terminal (PADAT)
- PA Search and Retrieval (PASAR)
- PA Tape Edition Lease and Licensing (PATELL)
At that point the database contained 75,000 records. With PADAT, researchers could perform interactive searches using a terminal in their own facilities. PASAR was a mail order service in which APA staff would perform the searches and mail the results to the researcher. PATELL made Psychological Abstracts tapes available to academic institutions capable of using them.
In 1973 APA moved to the Lockheed Corporation's system DIALOG, and the database was offered on a royalty basis to any qualified information analysis center.
In 1978 Psychological Abstracts was added to the offerings of the Bibliographic Retrieval Services (BRS), which had emerged in 1977. The database was also available through overseas distributors in Germany, Sweden, Canada, and South Africa.
With the increased services, it became clear that APA needed to educate users about the products and instruct them on how to find information. In 1976 APA released the "PsycINFO Users Reference Manual," a loose-leaf publication with semiannual updates included in the subscription. This publication followed the tutorials and workshops begun in 1975. Throughout the 1970s and 1980s, APA put special emphasis on what was then called "User Services."
Growth in Content
Keeping up with the growth in the psychological literature was a struggle for APA all through the 1970s and 1980s. Page limits in the print journal led to long lags for the publication of abstracts.
In 1978, APA determined that dissertation abstracts would be included only in the electronic file, book coverage would be eliminated, and coverage of lower priority journals was suspended for 2 years. This trend would continue.
In 1980 PsycINFO published 31,764 abstracts in electronic form with 26,844 appearing in Psychological Abstracts. By 1989, the annual total had grown to 52,442 abstracts with only 41,583 in the print journal. The number of journals covered had increased from 988 to 1,304.
However, the backlog continued to grow. In the mid-1980s the PsycINFO Advisory Committee reviewed all covered English-language journals by content-cluster areas with an eye toward eliminating lower priority journals. Subsequently, 269 journals were dropped between 1987 and 1988.
At the same time, APA continued to wrestle with the coverage of books. After experimenting with a print product called PsycBOOKS, the organization decided to fold all book records into PsycINFO and the CD-ROM product PsycLIT, which had been launched in 1986. From that point forward, all scholarly books deemed significant to psychology have been covered in PsycINFO.
The quality of covered journals continued to improve in the 1990s. Early in the decade the PsycINFO Advisory Committee undertook a comprehensive review of non-English language journals that included reviews by other national psychological associations and APA members in different countries. As a result, APA dropped more than 220 journals, but added an equal number of greater relevance and quality.
Continued improvement in APA's technology supported ongoing increases in the annual output of bibliographic records.
In 1994, PsycINFO developed a new in-house computerized abstract production system, the PTS ("Psyched to Serve"). Once the backlog created from the implementation was out of the way, APA was set to increase production significantly.
In 1995, the 1 millionth record was released into PsycINFO, 28 years after the electronic product was begun.
The Second Million—Growth from Legacy Documents
As the trend in information retrieval moved ever more steadily away from print to electronic, it became clear that APA needed to digitize the historical content that had appeared in print since the 1890s. Without them the scientific record contained a huge void.
1990s Backfiles Project
The backfiles project began in 1993 as PsycINFO staff worked with an offshore vendor to convert the 1927 to 1966 volumes of Psychological Abstracts to machine-readable form. Altogether these constituted 275,000 records with bibliographic information and abstracts.
They also digitized 3,500 records published in Psychological Bulletin between 1919 and 1926 as well as 2,700 records for 8 APA journals published between 1894 and 1919. The latter included such important journals as the American Journal of Psychology, Psychological Monographs, Psychological Review, and Psychological Bulletin. A separate PsycINFO backfile was released in September 1996.
Staff continued adding content, correcting data, and adding new information to the records. Then in February 1998, APA released a single combined file for PsycINFO that spanned 111 years. The new total was 1.5 million records. It was then possible to map new features back through the file. Production numbers had also increased significantly.
After averaging 54,000 records per year from 1990 to 1995, APA released 63,000 records in 1997. Until 1999, records from Dissertation Abstracts International included only a bibliographic citation; however, in 1999 APA added abstracts for the records from 1995 to 1999. Since that date all dissertation records have included abstracts.
A New Century With New Tools
By 2001, the number of journals covered had risen to more than 1,700.
The first big innovation in this decade was the addition of cited references to the database. Following the announcement in June 2001, staff added citations from 2000 and 2001 journals, and the new file debuted at the American Library Association Midwinter 2002 meeting. The new feature enabled scientists to track the literature forward and backward in time. Since then, APA has added 45 million cited references to PsycINFO.
Another legacy project added approximately 11,500 records from Psychological Index, first published in 1895. This content enhanced PsycINFO's coverage of the early days of psychology and added an increased international focus. Unlike the other legacy content, Psychological Index records contained only bibliographic information.
Machine-Aided Indexing was the next innovation. APA implemented state-of-the art software in 2002 to assist skilled indexers in the production teams. The goal was to bring greater consistency and speed to the labor-intensive indexing process and capitalize on the intellectual value-add APA's staff bring to the process.
With a focus on streamlining workflows, APA built a new production system that automated receipt of journals, improved the selection process, and provided excellent management reports. Careful attention to missing issues resulted in 1,600 additional issues covered in 2002 alone.
That year APA began designating "core journals," those high select journals with a selection percentage of 90% or more for comprehensive coverage. This change added more relevant records to the database and reduced the time considering selection. Another priority was implementing system error checks to minimize errors in the database and reduce the time spent searching for errors before release.
The development of new products such as PsycBOOKS also contributed to the growth in content in PsycINFO. For example, the initial offering of PsycBOOKS in 2004 included 230 classic books that had not been previously covered in PsycINFO, and subsequent releases included more. APA also added the bibliographic citations from the Harvard Book Lists between 1840 and 1971.
In August 2004, 9 years after the benchmark of 1 million was achieved, APA announced the release of the 2 millionth record released into PsycINFO.
The Third Million—Growth from Increased Coverage
In 2004 APA completed the first round of automation in the new production system. We also began extensive outsourcing, and production staff provided intensive training and quality assurance to bring records up to APA quality standards. As a result, PsycINFO grew by 106,057 records, a 38% increase over 2006 production.
PsycINFO was well on its way to the next million records. But for this era, the increase was to come primarily from expanded coverage of new content areas and newly published content, rather than from historical literature.
Rethinking User Needs
In the years that APA struggled to keep up with core content, it was necessary to maintain strict selection guidelines that assured coverage of core psychology. Once staff had streamlined production, it was possible to look beyond what was pure psychology to what psychologists and professionals in related disciplines might need in their work without the risk of overlooking core content. In the second half of the first decade in the 2000s, as interdisciplinary work became more and more the norm, this expansion seemed even more important. Increasingly scientists need access to a wide array of content.
Previously, for example, much of sociology was rejected because it was considered too macro and psychology was considered micro; however, it became clear that more of sociology would be important to PsycINFO users than we had previously thought. At one point, life sciences literature was considered only if it contained behavioral references. That content, too, was increasingly needed by psychologists and other social scientists. Likewise, PsycINFO began to cover more methods and statistics journals and to cover them more comprehensively.
Following suggestions that we may have been limiting our dissertation selection too severely, staff sought advice from the Electronic Resources Advisory Committee on expanding the algorithm. Expanded areas included additional areas of education, information science, mass communication, cultural anthropology, business management, marketing, ethnic and racial studies, neuroscience, public health, and speech pathology. As a result the number of dissertations covered in a year nearly doubled.
When we received requests for a cognitive science database, we began to investigate what producing one might entail. We eventually concluded after numerous conversations with our Library Advisory Council, other librarians, and psychologists that we really needed to expand our neuroscience coverage.
Our first step was to move 86 neuroscience journals from selective coverage to comprehensive coverage and to assign them to a separate priority so that we would get them into coverage quickly. Then we began reviewing the ISI-ranked top 100 neuroscience journals and began working to acquire the 36 we were not covering. By the end of the year, only a handful remained—those who refused to permit PsycINFO to cover them, and we have continued to add neuroscience journals. Staff worked with neuroscience experts identified by members of APA's Board of Scientific Affairs to help them determine which journals were key and what language changes we needed.
From 2006 to 2008 PsycINFO staff focused particularly intensively on the acquisition of new journals. At the end of 2008, PsycINFO was covering 2,450 journals, 750 more than in 2001. Since that time, reviews of lower priority journals and the demise of some journals have resulted in an evening out so that the total number covered has remained about the same.
Additional Historical Content
Some of the increased records came from historical content. For example, when APA digitized APA journals back to volume 1, number 1 and released an additional 73,000 articles into PsycARTICLES in 2006, the requisite match-up with a PsycINFO record resulted in additional records. As APA acquired back issues of journals not originally owned by APA, we filled in content in PsycINFO.
Other APA products also contributed to the increase in PsycINFO records. For example, book reviews had not been covered in PsycINFO prior to the release of PsycCRITIQUES in 2004. Consequently, the digitization of the Contemporary Psychology: APA Review of Books back to 1956 resulted in nearly 36,000 additional PsycINFO records. As APA added classic books to PsycBOOKS, the PsycINFO records grew as well. As a result, PsycINFO reached its 3 millionth record only 6 years after achieving the 2 million benchmark.
APA has continued to streamline workflows and add new tools. As the collaboration between PsycINFO and IT staff has resulted in new production systems (see sidebar), the lessons learned from producing one type of content feeds into improving the workflow in a different production system.
At the end of 2010, about half of the 2,465 journals covered come in as electronic feeds, and the bibliographic content is automatically parsed into the fields in the production system. This process enables staff to spend their time on the intellectual aspects of producing a quality PsycINFO record.
Following an intense collaborative effort by PsycINFO and ITS staff, an automated reference checking and tagging system was implemented for many of the journals covered in PsycINFO. This software eliminates the need to send these references to an outside vendor for manual tagging, thereby reducing both the cost and the time required. As a result, PsycINFO records can be released as much as a week earlier than they would have been depending on what day of the week they arrive.
As personal computers became ubiquitous in the 1990s, APA, like many abstracting and indexing services, focused less attention on training. It was in direct contrast to the intensive training in the 1970s and 1980s. However, as we moved into the new century, we received more and more requests for training.
APA has therefore concentrated on providing more documentation and training, particularly in the last 5 years. Today 4 training specialists provide live training at many conferences and special sessions, and they delivered nearly 100 webinars in 2010. They also have produced more than 60 short video tutorials that we invite librarians to download for their web pages. These can be found on YouTube and on APA's Librarian's Resource Center. In addition, about 2 dozen podcasts are available.
Expanded PsycINFO Record—Impact on Growth
In considering the growth of PsycINFO, it is important to look at how the record structure has changed over the years. The following are some additions by year:
- 2000—email subfield and DOI
- 2002—all affiliations not just first author, cited references, peer reviewed field
- 2003—tests and measures field, supplemental data field, format availability field, reviewed item field
- 2005—publication type, document type, and book type field replaced form/content field; methodology field, test/survey appended attribute
- 2006—sponsor field
- 2008—article identifier, document length
- 2009—publication date, publication status, first posting, publication history, copyright holder
All of these have added complexity to the production of the database, but offer increased functionality for the user.
Annual reloads also have included other refinements. For example, in 2004, APA added a Historical Note to the Thesaurus and began mapping new Thesaurus terms back to older records. That year unpublished tests were added to the Tests and Measures field. In 2005 See Case linking was added to abstracts.
APA will continue to expand the content in PsycINFO. The 2010 additions will total nearly 200,000 records, and we expect even more in 2011. We continue to seek new journals, and we are particularly concerned about filling any holes.
As always, we welcome suggestions on content or functionality. Recommendations and feedback from our users have been enormously helpful as we have grown the database. We look forward to your input as we embark on producing the next million records.
Senior Director, PsycINFO
APA Database Production Systems
- Hermes—for PsycINFO
- Apollo—for PsycEXTRA
- Athena—for PsycARTICLES
- Poseidon—for PsycCRITIQUES
- Artemis—for PsycTESTS