Wikipedia does not have to be truthful, but it is important that the information must be confirmed by reliable sources. Additionally, one of the most important factors affecting the quality of Wikipedia articles is the availability of reliable sources. By following the links in the references (footnotes), readers can check facts or find more information on a topic described.
The BestRef shows importance of information sources in references of Wikipedia Articles in different languages. Data extraction based on complex method using Wikimedia dumps in October 2023. To find the most important sources we used information about over 300 million references of Wikipedia articles. More details you can find in the scientific publications:
Frequency of source usage in F-model means how many references contain the analyzed domain in URL. This method was commonly used in different research works. So, F-model takes into account a total number of appearances of such reference, i.e., if the same source is cited 3 times, then the frequency will be equal 3. The following equation shows the calculation for F-model, where s is the source, n is a number of the considered Wikipedia articles, Cs(i) is a number of references using source s (e.q. domain in URL) in article i.
\[F(s)=\sum_{i=1}^n C_s(i)\]
PR-model uses cumulative pageviews (for the last 12 months) divided by the total number of the references in a considered article. Comparing to the previous model, here additionally popularity of the Wikipedia article and visibility of the references that used the analyzed source was taken the into account. This model amuses that in general the more references in the article, the less visible the specific reference is. The following equation shows the calculation of measure using PR-model, where s is the source, n is a number of the considered Wikipedia articles, C(i) is total number of the references in article i, Cs(i) is a number of the references using source s (e.q. domain in URL) in article i, V(i) is cumulative pageviews value of article i. Please note, that overcklocked values of the pageviews for some Wikipedia articles were reduced.
\[PR(s)=\sum_{i=1}^n {{V(i)} \over {C(i)}} \times C_s(i)\]
As the pageviews value of article is more related to readers, there is also another important measure that addresses the popularity among authors, i.e., number of users who decided to add content or make changes in the article. Given the assumptions of previous model, AR-model is related to authors. It is described on the following equation, where s is the source, n is a number of the considered Wikipedia articles, C(i) is total number of the references in article i, Cs(i) is a number of references using source s (e.q. domain in URL) in article i, E(i) is total number of registered authors (non-bots) of article i.
\[AR(s)=\sum_{i=1}^n {{E(i)} \over {C(i)}} \times C_s(i)\]
There is also BestRef extension for browser on Chrome Web Store. See short video on how it works:
Information about the reliability of sources can help to improve models for quality assessment of Wikipedia articles. This can be especially useful when comparing inconsistent facts between language versions of Wikipedia articles. In addition, one of the promising areas of upcoming research is the creation of publicly available tools that would make it possible to recommend the best sources for individual statements and on selected topics in different language versions of Wikipedia.
More information on research in this field can be found on the WikiQ project.