Sunday, July 26, 2020

Cherry Picking of Data: The Achilles' Heel of Bayes' Rule

We are awash in data.  In the age of Facebook/Amazon/Google, the amount of data generated, analyzed, and acted upon is beyond comprehension.  In a 2018 article, Forbes reported that 2.5 quintillion bytes of data were being generated every day.  A quintillion is a 1 with 15 zeros after it!  How do we even picture a quantity of that magnitude?  Here's a YouTube attempt at comparing a quintillion pennies to some familiar objects like football fields, the Empire State Building, the Sears Tower, and so forth.   If each penny were a a single byte of data, THAT's how many bytes of data we would be generating each day with all of our Google searches (5 billion), our cell phones, our snapchat / instagram photos, our interconnected IoT devices, and so on.

But that was way back in the ancient world of 2018.  The pace is only accelerating!

Needless to say, humans are incapable of processing that much data and so the very devices -- computers -- that are generating these data on our behalf are also on our behalf analyzing and acting on this data  through machine learning.  Data Science is thus by far the most far reaching and influential science of today.

Much of data science is based on the use of Bayes' Rule, a fundamental tenet of probability and statistics.  It says that the probability of an hypothesis being true given specific evidence is proportional to the product of two things:  the probability of the hypothesis being true in the absence of evidence and the likelihood that the specific evidence would turn up if the hypothesis were indeed true.  (Strictly speaking, there is another factor involve, a denominator that serves to normalize the measure, but the two factors in the numerator are the crucial ones.  In formula form this is p(H|E)=p(E|H) p(H) / P(E) ).

The main point of this post, however, is not discuss the appropriateness of using Bayes' Rule.  It is entirely appropriate and is indeed agreed to by all rational scientists, either explicitly or implicitly.  In fact, many scientists would say that acceptance of Bayes' Rule is one of the most important criteria for be considered rational at all.

No, the point here is not to discuss Bayes' Rule per se but to discuss its Achilles Heel, the cherry picking of data and what that leads to.  Given that we are awash in data, we have many cherries to pick from.  Given that we are fallible human beings with agendas to pursue, we are all too easily tempted to select data that reinforces the hypotheses we wish to believe and to avoid like the plague any uncomfortable data that may undermine the hypothesis we are (consciously or unconsciously) rooting for.

There's an old saying in computer science, "Garbage In, Garbage Out (GIGO)," meaning, of course, that the results of a computer program are only as good as the data you put in.   If you put into a program data that the program was not designed to handle, you will likely get results back that your program was never intended to produce.

With Bayes' Rule the situation is more subtle.  GIGO still holds for Bayes', of course, but with Bayes' the problem is not with garbage but with pristine, beautiful answers.   Answers that are too beautiful.   For with Bayes' you have the additional problem of getting only pleasing results out because you avoided putting anything unpleasant into the formula in the first place.  (This is related to the problem of overfitting in Machine Learning, but we won't go down that rabbit hole.) 

You don't even have to doctor the data to abuse it for personal gain (weather financial, psychological, emotional or political).   All you have to do is ignore data that doesn't fit your hypothesis or somehow weakens your hypothesis.  All you need to do is cherry pick the data.

When it comes to Fake News and political spin, the informal misuse of Bayes' Rule has reached the point of being a pandemic.  It is probably one of the main reasons we have become such a dangerously polarized society.  Each of two polar opposites ignores the evidence cherished by the other side.   We tend only to look at facts that reinforce our biases.

Bayes' Rule doesn't tell us which data to count as relevant evidence in our deliberations.  But it is unethical to ignore relevant data.   In a court of law, moreover, withholding exculpatory evidence is considered a crime.  This is precisely the crux of the matter in the recent abuses of the FISA courts.
At the start of the COVID-19 pandemic,

What can be done about this?  How can one guarantee that relevant data is not being ignored or simply brushed aside?  Should there be some international organization dedicated to establishing practical norms for ethical data science?  There have been plenty of discussion and articles on the topic.   But the concerns seem to cluster mostly around topics such as data privacy, fairness, social justice, and the general desire to avoid unpleasant outcomes.  But this seems to stem from concerns about taking inappropriate actions based on the data.

We need to separate the actions we take from the analysis of the state of the world.  Fear of taking the wrong actions should not prevent us from taking a sober and honest look at the data -- all the relevant data -- and not cherry picking to avoid unpleasant or inconvenient truths.





Thursday, July 23, 2020

Democrats and the God of the Narrative

People involved together in a crime must necessarily lie to cover their tracks, and to lie effectively enough to convince a jury, they must lie in a coordinated fashion.  They must collude.  But to collude efficiently, they must do so in top-down fashion;  they cannot allow a bottom-up, improvised, uncoordinated sort of lying, for that would be too easily discerned.  The individual liars must not embellish the lies with additional lies of their own.   All the liars must get on the same page and do so quickly.   They must get in lockstep with one another, and this will not happen unless it somehow comes from the top.

When it comes to the criminal courts, we generally don't try groups of people as groups.   We try individuals.  An individual criminal must not only be consistent in his lying -- which is certainly difficult enough -- he must also have corroborating witnesses who are willing to lie on his behalf.  And lying in court is a risky thing.  There are stiff penalties for perjury.

 Civil courts are different.  Organizations can certainly be sued in civil court.  It happens all the time.  Individuals are sued as well.  We have the RICO laws.

 From HG.org, we have the following quotation:
RICO law refers to the prosecution and defense of individuals who engage in organized crime. In 1970, Congress passed the Racketeer Influenced and Corrupt Organizations (RICO) Act in an effort to combat Mafia groups. Since that time, the law has been expanded and used to go after a variety of organizations, from corrupt police departments to motorcycle gangs. RICO law should not be thought of as a way to punish the commission of an isolated criminal act. Rather, the law establishes severe consequences for those who engage in a pattern of wrongdoing as a member of a criminal enterprise.
Consideration of witness tampering is one of the provisions of the RICO statutes.  So, if it can be proved that an organization engaged in bribery in exchange for collusion and fake corroboration on the witness stand, that organization can face civil penalties, and the individuals involved can face criminal prosecution.

But, as I said in the beginning, to lie or collude efficiently, there must be top-down directives.  

Which brings us to the Democratic Party.  It has often been noticed that on many major stories -- especially those dealing with President Trump -- several media outlets uncannily use the same phraseology if not the exact same phrases to communicate the same negative opinion toward Trump's person, beliefs and/or actions.   This cannot be a coincidence.  Given that today's new cycles are so rapid (high frequency, nearly instantaneous response), this coinciding of phraseology can only mean one thing:  It happens due to a top-down communique from a single organization, and the only viable candidate for that one organization is the Democratic National Committee or DNC. 

Fortunately, today's news environment is not limited to the handful of willing colluders, not limited to  CNN, MSNBC, etc., which outlets are undoubtedly receiving talking points on a daily basis from the DNC.   The Internet is brimming with myriad alternative sources.  The secondary sources may not be on distro for the DNCs talking points, indeed, may not even be sympathetic with those points.  But there are plenty who are, and some who have a more respected and permanent in the info world, such as Wikipedia.

But the upshot of all this is that ultimate source of this collusion, the DNC, is not interested in the truth but only in one thing:  power.  When it comes to Republican talking points there tends to be more individualism.  That's why there is a plethora of conspiracy theories on the right but the left seems to be more in lockstep.  The right wing conspiracy theorists are not waiting for talking points from the RNC.  Foxnews may pay attention to RNC talking points, but the individual talking heads at Fox are not shy about disagreeing with the RNC.   It would seem also that the RNC talking points are shaped more by the bottom-up mood of the rank and file than the opinions of a few elites at the top.  

To be sure, the RNC in the past has operated in some ways similar to the top-down approach of the DNC.  But that was before the political earthquake of 2016 called Donald Trump.




Sunday, August 24, 2014

Periodic Phenomena: The Limits of Fourier Series and Climate Prediction

The climate debates have raged for the last 25 years with each side calling the other nasty names:  'alarmists', 'deniers', 'skeptics' and so on.  But if we take a step back and examine what we would all like to know, namely, what are the long-term true trends for global temperature, then we can make a few a priori observations that are perhaps familiar to mathematicians and engineers, but, unfortunately, seem to be under appreciated by many climate scientists (not to mention politicians).

The observations have to do with the use of data sampling and Fourier analysis.  By sampling dynamic phenomena (sensing, for example) and applying Fourier analysis, one attempts to determine the major oscillatory phenomena underlying the signals generated by a complex system.  Jacques Fourier, in one of the most brilliant discoveries of modern mathematics, showed that any periodic function can be expressed uniquely by an infinite sum (i.e., and infinite series) of basic periodic functions, the familiar sines and cosine functions we all learned in high school trigonometry.  Moreover, this decomposition into sines and cosines applies to all functions defined on a finite interval, as long as we treat the function as if it were periodic in the sense that whatever pattern it illustrates on the finite interval is assumed to repeat indefinitely.   

Note the bold italics! They are the crux of the matter.  In other words, we can apply Fourier analysis to any function over a limited interval, if we pretend it is periodic with period defined by the length of that interval.  In other words, it only truly applies to periodic functions.  The image below illustrates a typical periodic function.


Any piece of an arbitrary, non-periodic function can be isolated and then turned into a periodic function for purposes of Fourier analysis.  And this is precisely what is done every time we apply Fourier analysis.

There are inherent limitations in using this technique.  When trying to apply this technique to understanding the functions at work in the world (in climate phenomena, e.g.), we are restricted of course to that interval of time during which we have reliable data.   And that is a very short time interval since the formation of the world about 4.5 billion years ago.  We have some temperature data going back a few centuries, but the quality of that data gets worse the farther we go back and before the 16th century, e.g., it is not very reliable at all.  We try to substitute proxies (such as tree ring growth) and that is all well and good, but these things are usually indicators of more than just temperature.  So the data records gets fuzzy and the uncertainty certainly grows.



Take a look at the image above.  The function is just the line y=x that rises indefinitely at 45 degrees.  However, when we restrict our attention to the interval from -pi to +pi and pretend that it is periodic with period determined by this interval, we get a sawtooth pattern.   Thus, the Fourier series really only tells about the periodicity present in this repeated pattern.  It tells us nothing about what the graph really looks like outside this interval.  There could be other long-term periodicities at play outside the interval of interest but the Fourier series is the same for all functions that agree with this one on the chosen interval.

Thus we have a fundamental problem in determining the true frequency content (i.e., the true periodic nature) of physical phenomena at work when we are necessarily limited to a very short observation window.  In fact, the rise in the function depicted could in fact be just a tiny segment of a much larger sine wave about which we have no information because our Fourier analysis assumes that the function is periodic with period determined by the interval.

Engineers know this very well.  There are always tradeoffs between frequency and time resolution imposed by short window Fourier transforms.  The Nyquist theorem states that you have to sample a signal at twice the frequency of the highest frequency present in order to see, i.e., faithfully reconstruct that frequency content.  This is the fundamental theorem and reality of communications engineering.  Also, to see the low frequencies present, you have to sample accurately for a long time.

With regards to climate data, we have not been sampling accurately for a very long time.   If the earth is approximately 5 billion years old and we have been sampling climate data accurately for 500 years (being generous), we have observed a window of time about 0.0000001 or 0.0001%, 1/10,000th of a percent of the true age of the earth.   Such an interval is completely unreliable for predicting 1,000-year periodic phenomena, not to mention any larger possible periods.

I am not saying that one should not be concerned about the warming properties of greenhouse gases.  There are some, like methane, that are very potent potential contributors to temperature increase.  But the contribution of our own C02 is extremely miniscule compared to the contributions of other phenomena, and we have only become aware of a small fraction of the natural phenomena, owing to the limited amount of time we have been looking at the phenomena.  Yes, C02 contributes to global warming.  But so does rubbing my hands together.




The Dangers of Ersatz Religion

The German word 'Ersatz' means 'substitute' in English, a term picked up by psychologists to describe what happens when people compensate for an unfulfilled psychological need by substituting something else, usually an inferior copy of the real thing needed.  Wikipedia has an article called "Ersatz good" which describes early German usage of the term (in German it can mean both a perfectly good substitute as well as an inadequate substitute).  Auf deutsch it has more of a literal meaning that the psychologized meaning it took on in English during the 1960s.  Today it seems more closely allied with the German usage than it was back then.  We hear of ersatz coffee, for example, to simply denote an inferior version of java.

The philosopher David Lewis considered "ersatzism" of the philosophical / psychological kind extensively in his analysis of possible worlds ontologies.  He talks about linguistic, pictorial and magical ersatzism.  Without getting into the distinctions between these three types, let us say that science to some extent engages necessarily in the first two, linguistic and pictorial ersatzism simply by the sheer fact that it is the job of science to create mathematical models of what we think is going on in nature, and mathematical models are nearly always simplifications or idealizations of the real thing they are trying to model.  However, with the advent of complex computer models and simulations of natural, dynamic phenomena, there has been a noticeable drift on the part of many scientists into the third category of magical ersatzism.

Science has for long relied on the corrective feedback mechanism of experiment to validate or refute its myriad hypotheses.  But since the dawn of the computer age in the 1950s and especially in the 1990s and beyond, computer simulations have slowly and subtly taken the place of hard empirical evidence in many areas of scientific research.  Owing to the ever evolving complexity and resource demands of such simulations, many of them are not easy for the ordinary scientist to reproduce -- let alone the educated layman.  They may require supercomputers to run and --heaven forbid! -- the use of arcane computer languages like FORTRAN.  Because of this, many of these models and simulations have taken on the air of divine oracles that only the anointed high priests may approach and which have become substitutes for nature herself in being the judge of the soundness of theories.

The global warming hysteria of the 1990s and early 2000s is a prime example of this.  Here the IPPC-endorsed models, those of NASA and other normally reputable organizations, have taken on the roles of infallible courts of last appeal.  Hypotheses and theories of climate dynamics were endorsed or rejected solely on the basis of whether or not they agreed with or disagreed with these models.  The result was self-reinforced confirmation bias on the part of those who stood guard over and maintained these models.

But, alas, Mother Nature, is not to be fooled forever.  Even the climate change alarmists cannot deny that their predictions of C02-caused temperature rise have not been borne out by the empirical evidence of the last 15 years.  There new buzz word to deal with all of this is 'hiatus.'  Here's how the editor of Science magazine summarizes the findings:
Global warming seems to have paused over the past 15 years while the deep ocean takes the heat instead. The thermal capacity of the oceans far exceeds that of the atmosphere, so the oceans can store up to 90% of the heat buildup caused by increased concentrations of greenhouse gases such as carbon dioxide. Chen and Tung used observational data to trace the pathways of recent ocean heating. They conclude that the deep Atlantic and Southern Oceans, but not the Pacific, have absorbed the excess heat that would otherwise have fueled continued warming.  
My translation is that they are finally saying that climate scientists predictions of temperature rise over the last 15 years have been on the whole consistently wrong, i.e., lacking in evidence.  In fact, some of the mechanisms being offered now as explanations are very similar to those offered by scientist Roy Spencer with regards to Pacific Decadal Oscillations.  Spencer, it should be noted, is consistently lambasted as a 'climate denier.'

My concern is that there are not only faulty scientific methodologies at work in the alarmist camp of climate science but also ersatz religion.  When you combine atheism (i.e., the denial of one's true religious needs) with scientism (the overemphasis of human rationality to solve all problems), you get an bad mix of things:  People looking for meaning in their lives and trying to find it in places it doesn't exist.  Thus, well-intended ersatz goals for humanity -- "we need to save the planet from evil planet-destroying capitalism!" -- are combined with wishful thinking pseudo-science (i.e., linguistic, pictorial ersatzism) until you arrive at magical ersatzism.

The danger is that these sort of pseudo scientific ersatz religions can easily drift over into totalitarian practices, like lefties trying to shut down scientific debate on the matter.  (Remember Al Gore's "settled science?")  Atheism as we know it today does not really have a very long history.  It emerged as a significant militant force in the world only since the dawn of the Enlightenment in the 18th century (which in turn has its roots in the Protestant Rebellion of the 16th century).  The French Revolution saw atheism combine with political passion; they invented the guillotine to more efficiently deal with deniers.  It bore the fruit of millions of murdered victims during the reign of totalitarian regimes such as Nazism and Communism during the 20th century.  There were plenty of Nazi scientists whose research and "data" sought to confirm the notion of German racial superiority and the inferiority of everybody else, most notably the Jews.

In complicated dynamical systems, whether natural or man-made -- and especially in global natural, earthly phenomena -- oscillations are the norm.  Control engineers know this all too well.  Many periodic phenomena are superimposed to yield many cycles of behavior from diurnal cycles to seasonal cycles to decadal, 20-year, 40-year, 100-year cycles and beyond.  Periodic phenomena are a ubiquitous feature of nature:  Rotation and orbits around hot masses called stars in the universal law of things.  We have only a very  meager understanding of these cycles on planet earth and have data samples for only a small fraction of the earth's 5 billion year history.  It is preposterous to substitute hypothetical, computer models based on a small sampling of data that only begins to to tell the story of very short-term phenomena.

But the real danger stems from those who would combine ersatz religions (pick your favorite 'ism') with pseudo science.  People need to have meaning in their lives and will search for it until they latch onto something that resembles it.  Surely, the prospect of saving the planet offers such meaning.  "Hope and change" are slogans of such people who are looking for real meaning in their lives but who have settled for something far less.  Hope is a good thing as long as what you are hoping for is a good thing.  But blind hope is as dangerous as blind faith.  Change is good, too, as long as the change is from bad to good, from worse to better.  But change from moderately good to something really bad is not to be desired.  

It is best to honestly and humbly strive to to discover the difference between real and apparent goods and then to set up goals based on a sound understanding of this.  Otherwise, we'll be jousting windmills with Mr. Quixote or worse, eliminating millions of innocent people like Mr. Hitler and Mr. Stalin.

To be fair and complete, the climate scientists proclamation that the hiatus is real, although a step in the right direction, does not yet allow one to definitely claim that global warming is now turning into global cooling. They claim that the heat is being stored deep in the Atlantic ocean and will reappear, or at least will cease absorbing the heat from the atmosphere in 15 years, giving rise to accelerated heating then. The hiatus is only a leveling off, not a reversal. We may have to wait a few more years before they admit to seeing a reversal trend. Then the word 'hiatus' will have to be discarded for something new, like 'temporary dip' in temperature. But this is what natural oscillations are all about, rises and dips in periodic phenomena.  They are still sticking to their story of the long-term upward trend.  But that upward trend may in fact just be the rise of longer period oscillation.  Only time will tell.

Saturday, June 28, 2014

The Idiocy of Timetables in Armed Conflict

The recent sudden collapse of large swaths of territory in Iraq, falling under control of Islamist State of Iraq and Syria (ISIS), is a testimony to the idiotic policy of the Obama Administration, idiotic for setting politically-motivated timetables for withdrawal from Iraq.  

Timetables like this in armed conflict are idiotic for several reasons.  First, it sends a message to the adversaries that all they have to do is simmer down, go into hibernation, regroup, and refortify until the magic date of withdrawal arrives.  Can you imagine a battle, like the won currently being waged by the Iraqi government in Tikrit against ISIS, being fought under the announced premise that the attacking government troops would simply declare victory at noon today and then retreat?   What kind of a battle plan is that?   Announcing when you are going to quit the battle?  That's like announcing to your opponent in a football game that you are going to put in your B team after the half.  What would your opponent do?  They would beef up their defense, prevent you from scoring, rest up on offense, and then bring in their crack offensive team after the halftime show -- and then decimate your B team.

So, timetable policies in armed conflicts are idiotic in that they basically reveal to your adversary your strategy.  You simply don't tip your hand like that in armed conflict!  This is Military Doctrine 101.  Don't reveal your strategy to your enemy!  It is foolish and childish.

Timetables like these are also foolish because you can only negotiate from a position of strength.  Only while you are in control of the situation -- i.e., have significant troops in country -- can you negotiate either with your adversary (if even possible) or, more importantly, with the team you are going to leave behind.  In the case of Iraq, the Maliki government and their military supporters -- on whom we spent billions to train and equip -- won't concede anything like a Status of Forces Agreement between us and them if they know we are planning to vacate anyway, based on politically motivated timetable.  They too, like our adversaries, will simply wait us out.  Once you've left the country and put them totally in control, you've lost your strength at the negotiating table.

A third reason why such timetables are idiotic is that they convey an overall message of weakness -- or fecklessness -- to our adversaries on the larger geopolitical stage.   We are seeing this in spades as Iran rushes in to fill the vacuum we've left behind in Iraq.  Broadcasting our strategy emboldens them to take action.  This message was clearly not lost on Russia, who after Obama's vacillation and phony red lines in Syria seized an opportunity to seize Crimea.  It probably won't be long before China grabs more territory, knowing that we will do nothing but wring our hands and complain at the UN, where they have veto. Taiwan is probably in the cross hairs already.

A fourth reason is that such weakness conveys a message to "allies" like Saudi Arabia and Eqypt that they cannot count on the US for current or future support, that they'll have to go it alone.  And where do you think the money for ISIS is coming from?  From Saudi Arabia!   This conflict, above all, is a larger Sunni-Shiite conflict that now threatens to embroil the whole region.  

In Syria it is essentially a Sunni-Shiite conflict.  Assad is an Alawite, an offshoot of Shiites.  His opponents are all Sunnis.   Hezbollah in Lebanon is supporting Assad.  Guess what religion they are.  That's right, they're Shiites!   Iran is supporting Assad in Syria.  Guess what religion they are.  Yep, you guessed it right again:  They are Shiites!   Guess what religion Maliki and his supporters are?   OK, this is getting tedious, but I sense you are getting the picture now.   The Sunni-Shiite conflict in Syria, with its 100,000+ civilian casualties and millions of refugees has spilled over into Iraq.

And within the Sunni ranks you have varying degrees of nastiness.  There are moderates like the Free Syrian Army (at least they are pretending to be moderates, perhaps like the Muslim Brotherhood in Egypt were pretending to be moderates once upon a time, at least until they came into power -- with our help).  And there are the nasty ones like Al-Qaeda and ISIS, who make Al-Qaeda look like moderates.

Anyway you look at it, setting a timetable in armed conflict is precisely not the thing to do.  The territory -- and more importantly the hearts and minds -- that we fought and died for in Iraq was not yet ready to be abandoned.  If we didn't like the Maliki government, we should have done something about it while we were there and had strength in negotiations.  But we foolishly announced a timetable for withdrawal and that set all of this mess in motion.

I have not seen in over 50 years of monitoring current events -- sort of a hobby of mine -- a US foreign policy so short sided, childish and downright foolish -- idiotic from almost any perspective.


Thursday, December 19, 2013

Phil Robertson

In all the hoopla about Phil Robertson's remark concerning the bible's attitude toward homosexuality, few seem to have noticed that A&E's reaction is pure and simple one out of regard solely for their pocket book.  They do not want to get sued by some advocacy group.  It is not that they care about homosexuality per se, but that they just don't want to get sued.  However, they may stand to lose a lot more money if 'Duck Dynasty' moves to another station.

The intimidation and shakedown artists are hard at work, and they have companies like A&E and Lockheed Martin shaking in their boots.  The latter announced today that they are no longer going to donate money to the Boy Scouts of America over the BSA's policy of not allowing homosexuals to be counted among their adult leaders.   UPS did the same recently as well.  You can bet your bottom dollar that this is about the shakedown artists, the advocacy groups making the rounds and threatening law suits.

Monday, July 8, 2013

Lone Ranger a Dud

This movie was a waste of time and money.  Although the cinematography was good and the sporadic laughs OK, the whole tenor of the movie was pathetic.  Simply put, it was another Johnny Depp, Anti-Capitalist, Anti-American rant!  I am beginning to become Anti-Johnny Depp.   Why doesn't he grow up and get a life?

It was too long and too tedious.  All the villains were in cahoots with the big bad railroad men.  The Comanches were portrayed as lily pure and innocent -- one should read about the real atrocities of the Comanches towards other tribes, not to mention against Euro-Americans.

I hope Disney loses money on this.  They deserve to lose money.

If Depp is so Anti-American, why does he insist on keep asking Americans to open their wallets to him?