Data for a Regression
I'd love to do a simple regression that looks at average FIFA rankings over the last X years vs population, per-capita GDP and some measure of overall social health in countries with a domestic league.
I just looked at the population numbers solo and I have a feeling the fit would be exceptionally strong.
I'm wondering if the number of years since the founding of a domestic professional league would be more relevant.
It might be, but in that case you're beginning to come perilously close to running into a common problem:
Causes need to precede their effects. In other words, using that guideline, what you may actually be showing is that when a country starts to get good, they then start to form professional leagues. What we're looking for are the causes that make a country good in the first place.
Also of concern is that we're not necessarily trying to determine who the best countries are (we can simply use results for that), but rather we're trying to ascertain how the natural resources at the country's disposal relate to overall team quality. It's an important question, because an accurate answer allows us to look at who is outperforming expectations (and why) and who is underperforming expectations (and why). The European or South American country that outperforms expectations by the largest amount is certainly a country whose development methods are well worth investigating.
I ran FIFA points (used to determine rankings) vs population in millions and per capita GDP and got squadoosh.
GDP is actually correlated, but only has an adjusted R squared of .12 or so. Population doesn't explain anything. At all.
Unless I'm doing something bass-ackward of course.
Also...
I added a field based on a metric that rates the strength of a country's institutions (basically "rule of law" and "lack of corruption"). Rank ordered 1, 2, 3 etc from good to bad.
My problem is that there's a positive correlation! (Which seems to suggest that the worse a country's institutions the better its football team, on average).
Argh
My next step is to add in a variable for how many years it has been since they played their first international match, and see if that's worth anything.
I'm wondering if the number of years since the founding of a domestic professional league would be more relevant. At least in the case of the US and Japan, their international standing changed dramatically after the founding of MLS and the J-league, respectively. The US played their first friendlies back in the 19th century.
I'm guessing that your rankings must be pretty different than FIFA's?
I'm only doing the top 100, but running your variables v FIFA points still doesn't give a whole lot. Though it's better than the ones I started with.
When I get a chance, I'll run the ones I'm using against FIFA's and see what I get.
Also, it might also help to take the natural log of the population rather than just the population.
I've ran them against my new rating system, and now the R is all the way up to around .85. 8 of the top 9 teams in my rankings, were also in the top 9 when projecting using GDP, Population and Confed (the USA didn't make the top 9 in the rankings, but did in the projections, and the Czech Republic did make the top 9 of the rankings but not the projections).
My next step is to add in a variable for how many years it has been since they played their first international match, and see if that's worth anything.
I'm only doing the top 100, but running your variables v FIFA points still doesn't give a whole lot. Though it's better than the ones I started with.
Sorry, Voros, but you've just set off my *#*#*#*#*#*#*#*# detector. A regression analysis just establishes that a variable and a response are correlated. It provides no information as to which is the cause and which is the effect.
Correct, so further common sense guidelines need to be enforced. Clearly the strength of a country's national side won't affect that country's population, so you don't really have to worry about reverse causation.
If you have correlation, you need to ask a few questions to come up with theories on causation:
a) Is the correlation significant?
b) Do the causes precede their effects?
c) Have you ruled out as many "third variables" as possible so that you don't have some third unknown variable that each has a causal relationship with.
e) Is there a possible mechanism to explain the causation.
So in this case, there are concerns with regards to points b) and c).
Countries without a fair amount of available soccer talent don't just start up professional soccer leagues.
It's not that I don't think having a professional league helps, it's that the particular variable your promoting carries within it, much more soccer related info than just this cause.
IOW, if having a fair amount of available soccer talent causes a country to be more likely to form a professional league, then with this variable, you clearly have a case where a certain portion of the effect you observe with this variable, is preceding it's cause. This is not really a concern with Population or GDP, or when the country played their first National Game, even though these variables also have other problems that need to be examined.
The reality is that the USA was a better team than more half of the countries out there, before MLS started up in 1996. So that the USA is better than Panama right now, needs to be weighed against the fact that the USA was better than Panama before MLS.
So if you want to examine this variable, what you'd need to do is look at how strong the countries are when the league starts, and then how strong they are a certain number of years down the line. You further need to look at control group of countries of similar strength when the league started, but who did not start up a professional league.
Causes need to precede their effects. In other words, using that guideline, what you may actually be showing is that when a country starts to get good, they then start to form professional leagues. What we're looking for are the causes that make a country good in the first place.
Sorry, Voros, but you've just set off my *#*#*#*#*#*#*#*# detector. A regression analysis just establishes that a variable and a response are correlated. It provides no information as to which is the cause and which is the effect.
IF your analysis showed a correlation between years with an established domestic league and national team ranking, then I would still argue that the league "causes" the improvement in ranking, and not the other way around.
If I remember correctly, Voros had done this, and probably the most important part was that it was only used inside federations (UEFA, CONCACAF, etc.).
Actually I used "dummy" variables for the Confeds. If a country was in a particular confed it got a "1" otherwise it got a "0" each of the confeds got a different coefficient except for OFC (you'll trash your Matrices if you use all the confeds for reason's I won't go into)
I didn't use FIFA's rankings but I used my own that I had been working on.
I've since redone my rankings to be a better, so I probably need to re-do the study.
Anyway the variables were: Confed, Total GDP, and GDP per capita. Using just these three I got an R of .82 and an adjusted R-Squared of .66.
The new system uses the Bradley-Terry method suggested by someone in another thread, but is modified a bunch of ways to apply to individual game scores.
It also uses a solving algorithm to find the competition multipliers that best fit the source data, feeds those back into the system, gets new ratings, redo the solving algorithm, and repeat the process until the numbers stabilize. Those multiples were tested against an independent data sample, and slightly adjusted based on those results and common sense.
So friendlies now count in the ratings, and it has a simple yet powerful method of predicting game results (FINALLY prohbitive favorites don't have too high of chances of being upset). Turks & Caicos probably doesn't want to know what it says Haiti is going to do them. The biggest shock is that American Samoa is NOT the worst ranked team. (Though they share something in common with the worst ranked team, other than lots of double digit losses).
Total salary of national team players?
Average attendance at top flight matches?
Number of continental champion's league trophies lifted by domestic league?
Domestic players sold for more than $x in last x years?
#If you have any other info about this subject , Please add it free.# |

