¿Es el Elo válido en casos regionales?

From The Ratings of Chessplayers Past & Present by Apard Elo.

***********

Arpard Elo was born August 25, 1903. . . the old rating system, was producing inconsistencies. By 1959, when the matter had become very critical, USCF called upon Professor Elo's unique combination of skills and interests, and he became a volunteer consultant. His investigation and development of scientific rating theory and practice began then and absorbed increasing portions of his time and attention ever since, as
the Elo Rating System, after restoring confidence in ratings, spread first over the US and then over the international chess scene.

p.229-230.

* * *

1.31 From general experience in sports we know that the stronger player does not invariably outperform the weaker. A player has good days and
bad, good tournaments and bad. By and large at any point in his career, a player will perform around some average level. Deviations
from this level occur, large deviations less frequently than small ones. These facts suggest the basic assumption of the Elo system. It
is best stated in the formal terms of statistics:

The many performances of an individual will be normally distributed, when evualated on an appropriate scale. Extensive investigation (Elo
1965, McClintock 1977) bore out the validity of the assumption. Alternative assumptions are discussed in 8.72.

8.72 Underlying any rating system is a sort of assumption, stated or implied, of the distribution of performances of the rating pool
members, and the degree of realism in the assumption bears directly on the possibilities of the success of the system. The three basic
distribution patterns which have figured importantly in rating system theory and practice are illustrated below. Each is given in the
particular form treated in this book, specified by equations (13),(44), and (54). Each pattern may be varied in form, of course by varying the constants in its equation.

Rectangular distribution describes no natural phenomenon, and certainly not chess performances, but both the normal and the Verhulst distributions occur frequently in nature and have, in all the data over more than a hundred years, provided reasonably serviceable
descriptions, of the distribution of chess performances in pools where no artificial influence was effective. Which of these is closer to reality? What would a statistical test show? No such test has ever been made [this book was written in 1986 by Elo, so tests may have been
made since then], but the theoretical requirements for a truely definitive examination are explored at 8.74.

8.73 Probability Functions

The probability function of a rating system determines the working formulae, and now practical considerations begin to demand attention.
The three probability functions are illustrated below, together with comparative tables of the resulting P and p. For the linear function D
is limited to 350E, as in the application of formula (12) described at 1.84.

xxxxx scoring probability %

x yy 80

xy 75

xy 60

yx 45

y x 30

yy x 15

xxxxx

-500 -250 0 250 500

rating difference in E

x=linear with cutoff

y=normal or logistic [the logistic and normal are nearly identical
curves, so I can't illustrate them in text very well]

A short comparative tabluation of P and p for the higher rated player resulting from the three functions follows at the top of the next page. The values of the Odds p are obtained from the Percentage Expectancies P using the relationship p= P/(1-P).

For small differences the results are indistinguishable. Some theoretical considerations in 8.41 support the Verhulst as a better representation of that elusive reality, but the difference is slight. [The current Elo
ratings use the Verhulst function]. If all three functions are acceptably realistic, practical considerations become decisive. The linear function may be expressed by a simple formula for R(p) or R(n), readily usable by players and organizers, but the advantage is superficial, because the formula lacks the sophistication and flexiblity to express the limitation on D and the deflation controls required for the integrity of the ratings.
The normal function requires the use of tables, an inconvenience to both paper and pencil statistician and the electronic computer, but the normal probability tables are readily available everywhere, and the normal distribution and probability curves are familiar statistical concepts of long and respected standing. Both the normal and the logistic naturally adapt to control processes and conform to statistical and probability laws. The logistic function better reflects large deviations in an extended series and since it is expressable by an equation, may be
computer programmed without memorizing a table or using numerical methods for evaluating numerous definite integrals.

(1) Rp = Rc + Dp

Rp is the performance rating

Rc is the (average) competition rating

Dp is to be read as the difference based on the percentage score P, which
is obtained from the curve or table

(2) Rn = Ro + K(W – We)

Rn is the new rating after the event.

Ro is the pre-event rating.

K is the rating point value of a single game score.

W is the actual game score, each win counting 1, each draw .5.

We is the expected game score based on Ro.

8.75 Equations (1) and (2) are the basic formulae of the Elo system and they are equally serviceable with other scales and other probability functions. They may be used with logistic probability provided Dp and We are determined from the logistic curve rather than the standard sigmoid. They may even be used with ratio scales, provided they are logarithmic and Dp and We are taken from the appropriate function.
Equation (2) indeed is such a direct statement of fact that it could have been accepted as an axiom. Its logic is evident without
derivation. The simplifying assumptions made in the course of its development turn out to be not critical in the final analysis. Even the assumption of normal distribution, the vehicle for the entire derivation is superfluous. Equation (2) with any reasonable form of
the probability function could be taken as the starting point of a rating system, as in done in 8.4 with We and Dp calculated from the function. Continued appliation of (2) eventually generates rating differences conformant to the selected function. The coefficient K need not be related to the slope of the probability curve, but may be
taken simply as a factor to control the sensitivity of (2) to changes in performances from one event to the next.

p. 12,13,155-159.

* * *

Rating ptc expectancy P Odds p

Differ normal log linear normal log linear

0 .5 .5 .5 1.00 1.00 1.00

50 .570 .571 .563 1.33 1.33 1.29

100 .638 .640 .625 1.76 1.78 1.67

150 .702 .703 .688 2.36 2.37 2.21

200 .760 .760 .750 3.17 3.17 3.00

250 .812 .808 .812 4.32 4.21 4.32

300 .856 .849 .875 5.94 5.62 7.00

350 .893 .882 .938 8.35 7.47 15.13

400 .923 .909 .938 11.99 9.99 15.13

450 .945 .930 .938 17.2 13.3 15.1

500 .961 .947 .938 24.6 17.9 15.1

600 .983 .969 .938 57.8 31.3 15.1

p. 158.

Un comentario en “¿Es el Elo válido en casos regionales?”

N.N. 30 noviembre -0001 a las 00:00

From The Ratings of Chessplayers Past & Present by Apard Elo.

***********

Arpard Elo was born August 25, 1903. . . the old rating system, was producing inconsistencies. By 1959, when the matter had become very critical, USCF called upon Professor Elo's unique combination of skills and interests, and he became a volunteer consultant. His investigation and development of scientific rating theory and practice began then and absorbed increasing portions of his time and attention ever since, as
the Elo Rating System, after restoring confidence in ratings, spread first over the US and then over the international chess scene.

p.229-230.

* * *

1.31 From general experience in sports we know that the stronger player does not invariably outperform the weaker. A player has good days and
bad, good tournaments and bad. By and large at any point in his career, a player will perform around some average level. Deviations
from this level occur, large deviations less frequently than small ones. These facts suggest the basic assumption of the Elo system. It
is best stated in the formal terms of statistics:

The many performances of an individual will be normally distributed, when evualated on an appropriate scale. Extensive investigation (Elo
1965, McClintock 1977) bore out the validity of the assumption. Alternative assumptions are discussed in 8.72.

8.72 Underlying any rating system is a sort of assumption, stated or implied, of the distribution of performances of the rating pool
members, and the degree of realism in the assumption bears directly on the possibilities of the success of the system. The three basic
distribution patterns which have figured importantly in rating system theory and practice are illustrated below. Each is given in the
particular form treated in this book, specified by equations (13),(44), and (54). Each pattern may be varied in form, of course by varying the constants in its equation.

Rectangular distribution describes no natural phenomenon, and certainly not chess performances, but both the normal and the Verhulst distributions occur frequently in nature and have, in all the data over more than a hundred years, provided reasonably serviceable
descriptions, of the distribution of chess performances in pools where no artificial influence was effective. Which of these is closer to reality? What would a statistical test show? No such test has ever been made [this book was written in 1986 by Elo, so tests may have been
made since then], but the theoretical requirements for a truely definitive examination are explored at 8.74.

8.73 Probability Functions

The probability function of a rating system determines the working formulae, and now practical considerations begin to demand attention.
The three probability functions are illustrated below, together with comparative tables of the resulting P and p. For the linear function D
is limited to 350E, as in the application of formula (12) described at 1.84.

xxxxx scoring probability %

x yy 80

xy 75

xy 60

yx 45

y x 30

yy x 15

xxxxx

-500 -250 0 250 500

rating difference in E

x=linear with cutoff

y=normal or logistic [the logistic and normal are nearly identical
curves, so I can't illustrate them in text very well]

A short comparative tabluation of P and p for the higher rated player resulting from the three functions follows at the top of the next page. The values of the Odds p are obtained from the Percentage Expectancies P using the relationship p= P/(1-P).

For small differences the results are indistinguishable. Some theoretical considerations in 8.41 support the Verhulst as a better representation of that elusive reality, but the difference is slight. [The current Elo
ratings use the Verhulst function]. If all three functions are acceptably realistic, practical considerations become decisive. The linear function may be expressed by a simple formula for R(p) or R(n), readily usable by players and organizers, but the advantage is superficial, because the formula lacks the sophistication and flexiblity to express the limitation on D and the deflation controls required for the integrity of the ratings.
The normal function requires the use of tables, an inconvenience to both paper and pencil statistician and the electronic computer, but the normal probability tables are readily available everywhere, and the normal distribution and probability curves are familiar statistical concepts of long and respected standing. Both the normal and the logistic naturally adapt to control processes and conform to statistical and probability laws. The logistic function better reflects large deviations in an extended series and since it is expressable by an equation, may be
computer programmed without memorizing a table or using numerical methods for evaluating numerous definite integrals.

(1) Rp = Rc + Dp

Rp is the performance rating

Rc is the (average) competition rating

Dp is to be read as the difference based on the percentage score P, which
is obtained from the curve or table

(2) Rn = Ro + K(W – We)

Rn is the new rating after the event.

Ro is the pre-event rating.

K is the rating point value of a single game score.

W is the actual game score, each win counting 1, each draw .5.

We is the expected game score based on Ro.

8.75 Equations (1) and (2) are the basic formulae of the Elo system and they are equally serviceable with other scales and other probability functions. They may be used with logistic probability provided Dp and We are determined from the logistic curve rather than the standard sigmoid. They may even be used with ratio scales, provided they are logarithmic and Dp and We are taken from the appropriate function.
Equation (2) indeed is such a direct statement of fact that it could have been accepted as an axiom. Its logic is evident without
derivation. The simplifying assumptions made in the course of its development turn out to be not critical in the final analysis. Even the assumption of normal distribution, the vehicle for the entire derivation is superfluous. Equation (2) with any reasonable form of
the probability function could be taken as the starting point of a rating system, as in done in 8.4 with We and Dp calculated from the function. Continued appliation of (2) eventually generates rating differences conformant to the selected function. The coefficient K need not be related to the slope of the probability curve, but may be
taken simply as a factor to control the sensitivity of (2) to changes in performances from one event to the next.

p. 12,13,155-159.

* * *

Rating ptc expectancy P Odds p

Differ normal log linear normal log linear

0 .5 .5 .5 1.00 1.00 1.00

50 .570 .571 .563 1.33 1.33 1.29

100 .638 .640 .625 1.76 1.78 1.67

150 .702 .703 .688 2.36 2.37 2.21

200 .760 .760 .750 3.17 3.17 3.00

250 .812 .808 .812 4.32 4.21 4.32

300 .856 .849 .875 5.94 5.62 7.00

350 .893 .882 .938 8.35 7.47 15.13

400 .923 .909 .938 11.99 9.99 15.13

450 .945 .930 .938 17.2 13.3 15.1

500 .961 .947 .938 24.6 17.9 15.1

600 .983 .969 .938 57.8 31.3 15.1

p. 158.

Los comentarios están cerrados.