TritaH (Scholar H-Index Batch Calculator) - Luca Boscolo

From VIAWiki

Revision as of 22:04, 5 July 2013 by Luca (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

TritaH (Scholar H-Index Batch Calculator) is a software designed by Luca Boscolo that calculates, automatically, h-index and other parameters using the Google Scholar database.

This software, written in C# and VB.NET, it is based on the .net framework, it runs on a Windows Web Server machine and the data are saved in a SQL Server database.

Contents

DISCLAIMER

Since a lot of people contacted me saying TritaH does not capture all their publications, I must highlight that TritaH displays the results provided by Google Scholar database based on a specific search. It could happen some publications are not captured by Google Scholar Search and therefore they are not displayed by the TritaH. There are several reasons for that, I think the most common is some publications are captured only if the search is done by the initial of the name, usually this kind of search introduces also a lot of homonyms. For this reason I have created this My Citations service which, using Google Scholar MyCitations, allows the authors to select their own publications.

The Method

30 JUNE 2012 - PLEASE NOTE: the method has been modified and it now produces the same results as Harzing's Publish or Perish

Here it is a comparison with the other major software (01/07/2012)

Boscolo's TritaH Scholar H-Index Batch Calculator Harzing's Publish or Perish Ianni's Scholar H-Index Calculator (Quoted author name) TIS
CM Croce H-Index: 149, Papers: 1000, Citations: 89613, Years: 42 H-Index: 149, Papers: 1000, Citations: 89558, Years: 42 H-Index:>100, Papers: >100, Citations: 50160, Years: ?? H-Index: 149
Napoleone Ferrara H-Index: 118, Papers: 568, Citations: 80561, Years: 31 H-Index: 118, Papers: 568, Citations: 80561, Years: 31 H-Index:>100, Papers: >100, Citations: 72217, Years: ?? H-Index: 132


Alberto Mantovani H-Index: 116, Papers: 1000, Citations: 53187, Years: 38 H-Index: 116, Papers: 1000, Citations: 53187, Years: 54 (the publications end in 1975 with the exception of one in 1959 with zero cites. TritaH does not pick it up automatically) H-Index:>100, Papers: >100, Citations: 34452, Years: ?? H-Index: 130


Giorgio Trinchieri H-Index: 109, Papers: 384, Citations: 44153, Years: 38 H-Index: 109, Papers: 384, Citations: 44153, Years: 38 H-Index:>100, Papers: >100, Citations: 36022, Years: ?? H-Index: 123


Ettore Appella H-Index: 103, Papers: 507, Citations: 41543, Years: 52 H-Index: 103, Papers: 507, Citations: 41543, Years: 52 H-Index:>100, Papers: >100, Citations: 30153, Years: ?? H-Index: 115


Giuseppe Remuzzi H-Index: 108, Papers: 883, Citations: 44946, Years: 36 H-Index: 108, Papers: 883, Citations: 44946, Years: 113 (the publication list shows a publication in 1900 with 0 cites, TritaH ignores this automatically) H-Index:>100, Papers: >100, Citations: 28434, Years: ?? H-Index: 114


Tomaso Poggio H-Index: 96, Papers: 694, Citations: 50167, Years: 42 H-Index: 96, Papers: 694, Citations: 50167, Years: 42 H-Index: 96, Papers: ??, Citations: 43753, Years: ?? H-Index: 104


Dario Alessi H-Index: 85, Papers: 266, Citations: 34084, Years: 22 H-Index: 85, Papers: 266, Citations: 34084, Years: 22 H-Index: 85, Papers: ??, Citations: 32256, Years: ?? H-Index: 94


Piero Anversa H-Index: 87, Papers: 266, Citations: 34084, Years: 22 H-Index: 87, Papers: 266, Citations: 34084, Years: 22 H-Index: 87, Papers: ??, Citations: 31480, Years: ?? H-Index: 94
Luigi Tavazzi H-Index: 66, Papers: 351, Citations: 22031, Years: 35 H-Index: 66, Papers: 351, Citations: 22031, Years: 35 H-Index: 64, Papers: ??, Citations: 20143 , Years: ?? H-Index: 66



As you can see from this comparison, Boscolo's Scholar H-Index Batch Calculator and Harzing's Publish or Perish provide the same results while Ianni's Scholar H-Index Calculator is limited to the first 100 publications and it runs only on Mozilla Firefox as an add-on. To run Publish or Perish you need to install the software on your computer and after about 100 queries Google blocks the software while with Scholar H-Index Batch Calculator you do not have to install any software on your computer, since it runs on a website and there are no limits on its usage, but you have to provide an email in order to get the results. Also Boscolo's TritaH is more accurate then Harzing's PoP when calculating the academic age.


This software automatises the operations that an user would do to calculate the h-index using the Google Scholar database, that is, it connects to the Google Scholar website, it inserts name and surname, if required, it filters by area, then it gets the publications list. Hence, it downloads this publications list in a database with the related information, that is:

  • complete list of authors,
  • cites number,
  • publication link,
  • publication title,
  • publication year,
  • publisher,
  • journal.

TritaH (Scholar H-Index Batch Calculator) is able to download into a database, all the publication data for all the Italian Academics (about 57,000). The information about the Italian Academics has been downloaded from the Cineca(MIUR) website. For each Academic, the software is able to calculate the following paramenters, suggested by ANVUR [1] (National Agency for the Evaluation of Universities and Research Institutes) for the evaluation of the Italian Academics:

  • h-index.
  • H-Normage - this is an h-index normalised by age, which means, in calculating h-index, the cites number is divided by age, where age depends on publication year in the following way: if year is 2012 then age is equal to 1, if year is 2011 then age is equal to 2 and so on.
  • H-AutoSpecial - this is an h-index normalised by author, which means, in calculating h-index, the cites number is moltiply by a rate, where rate is calculated in the following way: rate equal to 1 if Author is either at the first or last place in the Authors list, rate uqual to 0.5 if Author is not in the first o last place in the Authors list and rate is equal to 0 if Author is not contained in the Authors list. Valid only if you tick the option 'Complete Authors List'.
  • Total number of cites. This number is the sum of cites for all the publications downloaded for the selected Acedemic.
  • Papers over the last 10 years: this is the total number of publications over the last 10 years. For example in year 2002, it is the total number of publications from 2002 to 2012.
  • H-IF Index.
  • Total number of publications.
  • Academic Age = [(2012 - First_Publication_Year) + 1]. Sometimes wrong data can be found, for example year publication can be 1792, to overcome this problem only publications whose years are greater than 1960 are considered.
  • H-Index Normalised by Academic Age - It is H-index divided by Academic Age.
  • Total Number of Citations Normalised by Academic Age - It is the Total Number of Citations divided by Academic Age.
  • Hc-Index - the contemporary H-Index [2], basis of the H-Normage [3], in which the citations are weighted by a factor of 4 to favour more recent publications.
Formula to calculate the Hc-Index

where:

  • C(i,t) is the number of citations of the i publication during t year;
  • ti is the publication year;

The Hc-Index is the H-Index of the publications where instead of the real cites, it considers the calclulated S(i,t).

[The calculations are done by rounding up to the next higher integer number]

Notes on Hc-Index A disadvantage of the h-index is that it cannot decline. That means that academics who “retire” after 10-20 active years of publishing maintain their high h-index even if they never publish another paper. In order to address this issue, the contemporary h-index has been proposed. The contemporary h-index adds an age-related weighting to each cited article, giving (by default; this depends on the parametrization) less weight to older articles. For junior academics the contemporary h-index is generally close to their regular h-index as most of the papers included in their h-index will be recent. For more established academics there can be a substantial difference between the two indices, indicating that most of the papers included in their h-index have been published some time ago. As such the contemporary h-index often provides a slightly fairer comparison between junior and senior academics than the regular h-index.

What is a Minor Citation? A Minor Citation is when the Scientist is cited in the Publication Text, but not in the Authors List. TritaH searches into the Google Scholar database with the option "with the exact phrase", it includes patents and it does NOT include Minor Citations. Also, it filters by area depending on the Academic SSD in the following way:


        SSD = FILTER BY AREA
        AGR = Biology, Chemistry, Social Sciences (and Engineering)
        BIO = Biology, Medicine but if BIO/10 or Bio/14 then Biology, Medicine and Chemistry
        CHIM = Chemistry e Physics
        FIS = Physics
        GEO = Physics, Chemistry, Biology
        ICAR = Enginnering e Social/Arts and Humanities (SAH)
        INF  = Engineeering/maths/Computer Sciences
        ING-IND = Engineering (+ Biology only for /05)
        ING-INF = Engineering (+ Biology for /06)
        IUS = SAH
               L(-ANT = SAH)
               L(-ART = SAH)
        L-FIL-LET = SAH
               L(-LIN = SAH)
        L-OR = SAH
        MAT = Engineering e Physics
               M(-DEA = SAH)
        MED = Medicine e Biology
        M-EDF = Medicione, Biology e SAH
               M(-FIL = SAH)
        M-GGR = SAH, Business e Biology
               M(-PED = SAH)
        M-PSI = Medicine, Biology e SAH
               M(-STO = SAH)
        SECS-P = Business e SAH
        SECS-S = Business, Engineering e SAH
               SPS = SAH
        VET = Medicine e Biology

The Search is done normally by Name and Surname, a part for CHIM and ING where the search is done by initial of the name and surname

Because many Italian Academics have two o more names and the search with it has been proved completely wrong in many cases, therefore, to reduced this kind error, TritaH considers only the first name. While it takes the Surname as it is, even though it has 2 o more Surnames.

Calculation of the errors

The main areas of errors are homonyms and double names.

Homonyms - the software is not able to distinguish publications done by different authors with the same name and surname or same name intial and surname. To riduce this kind of error the search has been done by filtering per area.

Double names - to reduce this error the search has been done considering only the first name.

The error for few already examined SSD, such as BIO, MED and M-PSI, has been calculated less than 1 point.



Top 100 Homonym Italian Academics List, downloaded from the Cineca(MIUR) web site (Nov 2010)

Surname(COUNT),ROSSI(214),RUSSO(141),FERRARI(104),ROMANO(92),BIANCHI(88),RICCI(79),CONTI(68),COLOMBO(66),GIORDANO(63),BRUNO(62),GRECO(60),MARINO(60),ESPOSITO(59),RIZZO(58),COSTA(57),DE LUCA(52),GALLO(49),LOMBARDI(48),MARCHETTI(47),RINALDI(47),MANCINI(47),NERI(46),MARINI(46),LONGO(44),BARBIERI(43),FONTANA(42),CARUSO(42),MARTINELLI(41),LOMBARDO(41),GRASSI(41),MORETTI(40),LEONE(40),GALLI(40),FERRETTI(39),D ANGELO(39),VILLA(38),SANTORO(38),DE ROSA(37),MONTI(37),CONTE(36),MARIANI(36),FERRARA(36),PINTO(35),PALUMBO(35),ROMEO(35),DE ANGELIS(35),GENTILE(35),MONTANARI(35),GRASSO(34),BARONE(34),FABBRI(34),LEONARDI(34),SERRA(34),VALENTINI(34),SANTINI(33),MESSINA(33),RIZZI(33),POLI(33),PELLEGRINI(32),MARTINI(32),BIANCO(32),CARBONE(32),VITALE(32),COPPOLA(32),MORELLI(31),BIONDI(31),FERRANTE(31),GATTI(31),DE SANTIS(31),D ALESSANDRO(31),PARISI(30),PIAZZA(30),SALERNO(30),MONACO(29),VENTURA(29),BERTI(29),AMATO(29),CAPUTO(29),ANTONELLI(28),PUGLIESE(28),GIULIANI(28),SILVESTRI(28),NEGRI(28),MOTTA(28),MARINELLI(27),D AGOSTINO(27),CATALANO(27),VILLANI(27),FRANCO(27),ORLANDI(27),VALENTE(27),BERNARDI(27),PALMIERI(27),MAGGI(26),BRUNI(26),MOLINARI(26),CASTELLI(26),ANGELINI(26),SANNA(26),VALENTI(26)



Conclusions

This is ONLY an automatic calculation and although it provides good statistical results when considering an entire area of scientitsts, a human intervention it is highly recommended when evaluating single individuals.


Online version

There is an online version of the TritaH (Scholar H-Index Batch Calculator)


References

Google Scholar database is a freely accessible web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines.

Scholar H-Index Calculator it is an Add-ons for Firefox, developed by G.B IANNI, that displays on top of Google Scholar result pages, the corresponding h-index, g-index, e-index and other measures of impact for the submitted query.

Publish or Perish is a software program, designed by Anne-Wil Harzing that retrieves and analyzes academic citations. It uses Google Scholar to obtain the raw citations.

Personal tools