Researchers are often asked to demonstrate the impact of their work, and administrators and committees aim to make accurate assessments of the quality and impact of researchers and their work. Often both of these start by looking at things you can count – how much have you published, how well-known are the places where you have published, how many times have you been cited?
While all of these can be useful indicators, it’s also important to understand in context where these figures come from, and how to use them in ways that are responsible and not misleading. Below you’ll find some brief overviews of the best known metrics, caveats about using them, and references to many other resources where you can learn more.
First, some key things to keep in mind as you use any research assessment metrics
1. The most widely used research metrics tend to value only a subset of what researchers actually do, by focusing on the things that are easiest to count systematically.
Common metrics often exclude valuable scholarly outputs that are difficult to find tangible, easily countable evidence of. While it’s more difficult and time-consuming to do, it’s important to think first about what you really value and then look for evidence of it, rather than starting from the limited point of what data is easily available, or what has been “traditionally” used (if those traditions may be harmful, why continue them?).
In this blog post, Christopher P. Long, Professor of Philosophy and Dean of the College of Arts & Letters at Michigan State University, writes “If our values don’t drive our metrics, our metrics will distort our values” and suggests some approaches to new values-based metrics and assessment approaches.
2. Different disciplines have different patterns of publishing, ways of citing, and ways of listing authors, leading to misunderstandings when metrics are thought to be normalized across disciplines.
For example, for disciplines that tend to publish a small number of books rather than a large number of articles, h-index and other journal citation measurements will not be informative, and are likely to be misleading.
As this 2006 study found, “Faculty with earlier surname initials are significantly more likely to receive tenure at top ten economics departments, are significantly more likely to become fellows of the Econometric Society, and, to a lesser extent, are more likely to receive the Clark Medal and the Nobel Prize” largely because “the norm in the economics profession prescribing alphabetical ordering of credits on coauthored publications” leads reviewers from other disciplines to incorrectly ascribe greater importance to first authors of economics papers. It’s important to understand the patterns within a particular discipline before ascribing value to any particular set of metrics.
3. There’s a growing amount of bibliometric evidence of systematic bias in how works are cited and how those citations are counted and valued.
This article in Nature in March 2022 references data indicating that “Over the past decade or so, bibliometric assessments have shown how citation rates for men are, on average, higher than those for women across a wide range of fields, including economics, astronomy, neuroscience and physics — even when controlling for other factors that might influence citations, such as author seniority, or the year or the journal in which a paper is published. Men also cite their own work more often than women do. A gap exists among racial and ethnic categories, too, with white scholars being cited at higher rates than people of colour in several disciplines.”
There are also differences in citation patterns for researchers whose primary language is not English, or who are not based in North America or Europe. These patterns, sometimes known as the “Matthew effect” and the “Matilda effect” can be harmful, even if unintentional.
4. Metrics can be gamed.
There are many documented cases of researchers or others using various techniques to artificially inflate their own scores. Some kinds of metrics are more vulnerable to manipulation than others, so it’s important to understand where the data came from, how they are calculated, and what they do and don’t serve as indicators for. Also, it’s more helpful to look beyond the numbers and investigate the context. Many article-level and altmetrics tools allow you to see who made the reference and what they said. It should become quickly obvious if the numbers are inflated by spurious references, or if the references are positive or negative, or whether some are from more valuable sources than others.
Some commonly used tools for assessing research impact
- Impact factor is a measurement of the citation rate of a particular journal. Individuals do not have an impact factor, publications do – so impact factor can’t tell you anything about the quality, influence, or impact of a particular article or the researchers that authored it, only the average number of citations for all the works published there over the previous two years. However, as demonstrated in some of the resources linked below, these averages hide a lot of variation and can be misleading. Because citation practices vary widely between disciplines and many other traits of a journal, impact factors also vary. While the number could differ depending on the tool used to measure the number of citations, Thomson Reuters’ Journal Citation Reports (JCR) tool is generally the industry standard for measuring impact factor.
- The Eigenfactor is similar to impact factor in that it attempts to measure the “total importance” of a journal. It considers citations five years from the year of publication (more than the two of JCR’s impact factor). It also weighs citations from more influential journals higher than those from less influential journals. The eigenfactors of all the journals in the eigenfactor index are scaled to sum to 100, so that a journal with a 1.00 eigenfactor has 1% of the “total importance” of all the journals in the index.
- The h-index is an indicator of an individual researcher’s impact as measured by how often their work has been cited. The index takes into consideration the number of citations of a researcher’s publications (i.e. a researchers with an index of h has published h papers which have been cited h at least times). The h-index was proposed in 2005 by physicist Jorge Hirsch and is alternately called the “Hirsch index” or “Hirsch number.” A number of services calculate h-index and the number can vary widely depending on the publications known to that service – i.e., someone’s h-index as shown in Google Scholar may be very different than the same person’s h-index calculated by Web of Science, Scopus, or PubMed, because each of these databases has access to different sets of publications to base their calculations on. See below for some critiques of the h-index, including by its inventor.
- Article-level metrics: Instead of attempting to assess journals or individuals by counting citations across large sets of publications, article-level metrics indicate the impact or attention to individual articles in context. PLoS pioneered this approach, and they measure usage (page views, downloads), citations (using Scopus, Web of Science, PMC, etc. data), and social networking mentions. Other publishers and repositories have begun to use article-level metrics as well, often displaying badges from Altmetric or Dimensions or Plum Analytics, which provide information both about how many times a work has been cited or linked as well as the context in which they have been referenced, allowing for a more nuanced understanding of what others are saying about a particular work, not just the fact that it was cited.
Detailed guides to many more tools for assessing research impact
These guides provide a good overview and more detail of many different research metrics that are available, how to find them, and how to use them:
- Research Impact guide from the Duke Medical Center Library and Archives
- Research Impact & Scholarly Profiles guide from the University of California
- Metrics Toolkit
Resources about understanding context and using metrics responsibly
These resources provide further information about the context for widely used research metrics, some caveats that are important to understand about them, and how to use them responsibly:
- Beat it, impact factor! Publishing elite turns against controversial metric (Nature, July 2016)
- What’s wrong with the journal impact factor in 5 graphs (Nature, April 2018)
- Rethinking impact factors: better ways to judge a journal (Nature, May 2019)
- The allure of the journal impact factor holds firm, despite its flaws (Nature, August 2019)
- What’s wrong with the h-index, according to its inventor (Nature, March 2020)
- Why the h-index is a bogus measure of academic impact (The Conversation, July 2020)
- The rise of citational justice: how scholars are making references fairer (Nature, March 2022)
Duke Resources for Assessing Research Impact
- In May 2018, the Duke Tenure Standards Committee prepared a report for the Provost evaluating Duke’s criteria for tenure and promotion, and making recommendations to address increasingly diverse forms of scholarship, new modes of communicating scholarship, and avoiding bias. Regarding metrics, the report notes:
While the above metrics should continue to be used, assessments at all stages—hiring, three-year review, tenure, promotion to full—should take account of their limits and flaws as indicators of scholarly excellence. Tendencies to over-rely on them and reduce engaged and careful reading of files – whether by ad hoc committees, departmental/School reviews, or APT – need to be resisted. What is to be stressed here is the need to take quantitative measures in a qualitative context. That is, rather than just taking any one rubric at face value, other considerations must be factored in such as selecting the right comparison of peers – and, again, careful and engaged reading of the file.
The report also makes recommendations about how to recognize and address systemic bias, and how to assess new forms of scholarship, the arts, and other areas that are not well served by “traditional” metrics.
- The DukeSpace repository and Duke Research Data Repository serve as places to archive and disseminate open access versions of faculty and student publications, data, and other scholarly outputs, and make them findable to the public through search indexes like Google and Google Scholar. These repositories track views and downloads if items that have been made available by Duke researchers, so you can see how many times your research has been accessed. Look in the sidebar of the repository item page to find usage statistics for any given item.
Guides for evaluating digital scholarship, from scholarly societies
Several scholarly societies have released guidance for how to evaluate new digital forms of scholarship:
- Guidelines for Evaluating Work in Digital Humanities and Digital Media from the Modern Language Association (MLA)
- Professional Evaluation of Digital Scholarship in History from the American Historical Association (AHA)
- Guidelines for the Evaluation of Digital Scholarship in Art and Architectural History for Promotion and Tenure from the College Art Association (CAA) and the Society of Architectural Historians (SAH)
Efforts to reform research assessment and use metrics responsibly
The San Francisco Declaration on Research Assessment was published in 2013 and it “calls for placing less emphasis on publication metrics and becoming more inclusive of non-article outputs.” DORA places specific emphasis on how flawed the journal impact factor is, despite its widespread use.
- DORA calls for academic institutions to pledge to:
- Establish new criteria for hiring, tenure, and promotion that emphasize the quality of the research content “rather than the venue of publication.”
- Consider other research outputs beyond journal publications so that promotion and funding decisions are no longer made based on citations and quantitative metrics alone.
- Be aware of the different types of metrics and their strengths and weakness.
- Publishers and metrics providers are encouraged to:
- De-emphasize the importance of impact factor in their marketing and explain that it is only one method of assessing research impact.
- Be transparent about their data collection methods.
The Leiden Manifesto is a document published in 2015 that provides a list of ten major principles for changing the way research is assessed:
- Quantitative evaluation should support qualitative, expert assessment.
- Measure performance against the research missions of the institution, group, or researcher.
- Protect excellence in locally relevant research.
- Keep data collection and analytical processes [about research] open, transparent, and simple.
- Allow those evaluated to verify data and analysis [about their work].
- Account for variation by field in publication and citation practices.
- Base assessment of individual researchers [at an institutional level] on a qualitative judgement of their portfolio.
- Avoid misplaced concreteness and false precision [of quantitative methods of data collection].
- Recognize the systemic effects of assessment and indicators and preference a “suite of indicators” over a single metric such as journal impact factor.
- Scrutinize indicators regularly and update them to reflect changing research ecosystems.