Cluster Analysis on R

This was going to be a magic post about clustering an it amazing abilities to find out hidden gems, gaining insight from the data, reducing a problem to a manageable situation, or better explaining a given situation to business partners that might need to focus on an appropriately segmented market.

The problems that I usually deal with have a mix of categorical and continuous variables, often identifying demographic information and geographical information tied to customer satisfaction surveys that are one of the tools a business partner has to answer very specific problems about their planning process and the evolution of their offers.

Clustering is a great way to group these existing customers, identifying common characteristics and adjusting offers and planning process to satisfy current and expected demands. But, the categorical data puts a dent on the clustering process: I can’t properly calculate distances between observations without defining a metric or properly adjusting for what the categorical distances mean. Many people have mentioned that there is no sense on making, say, the days of the week into a number, because what exactly means 3.3 days, half a Wednesday? What does mean when you have 3.5 ethnicity? We have to find a new way to measure that; frequency of categories and measuring medoids.

Next: datasets that feature categorical and continuous variables.

On the horrible effects of Star Wars

Recently, with all the hoopla about Star Wars and derivative stories, there has been an increase in request with playtime with the stories relating to Star Wars and the changing contemporary mythology. My kid has been asking about Jedi, the Force, droids and things like it.

So, we started watching Episode IV, with a huge amount of commentary about the politics of the time when it was written, the desire of the author to write around Joseph Campbell’s the Hero’s Journey, the need to make the characters larger than life, almost unbelievable, and therefore the hyperbole, the exaggeration and, most importantly, the need to accept that the story is flawed, and we have to accept it without close examination, just watch and see the journey of our poor hero.

Of course, after all has been said and watched, all the Legos played with, there is one request.

Kid – “I want to make a Death Star.”

Me – “There is a nice Lego one.”

Kid – “No, it has to be real size.”

Me – “Of course.”

Kid – “Can we have it finished for the weekend?”

Running R in Docker

Can’t be easier than that. I just launched Digital Ocean instance, appliance Docker, and install Rocker. A perfect, nice RStudio server to play with when on the move.

Tagged with: ,

Accesorizing your party by inviting The Other

papier mache bullIn these days no party is complete without the correct amenities: a magician, a caterer, the correct decoration for the season.

After all the papel picado has been bought from that little quirky place in the totally not sketch store down the mall, bowdlerized and non-threatening enough, what else can the happy WASP get for a proper color for the party? We have the Day of the Dead decorations, the ghoulish decals and lights, and the yummy biscuits, what else can you get that your neighbors can’t simply buy at the local Whole Foods?

Why, invite a real Latino, of course!

They speak with an adorable accent, are not as threatening as the African American kids at school, will probably make less money than your spouse, so not a challenge there! Since these Latinos sort of understand English, you can always depend on them to give you help when carrying trays full of pan de muerto, and they are so helpful when cleaning afterwards!

Also, since they are trying to prove that they are better than the other minorities, their gifts are large and extravagant, the sort of thing that you can always use as a conversation piece in future parties; nothing to play with, of course, because these garish colours are dreadful, and so difficult to show, but above the mantel, these papier mâché bulls look astounding.

Be wary, though, there are caveats when inviting these colorful characters to your home. First, they will want to talk to you, with their terrible accents and their crazy loud shirts. Also, all those gold chains! Luckily, if you invite two they will just sit in a corner talking quietly amongst themselves. In that note, though, do not invite many more, because they get loud and obnoxious when in groups! Or simply ignore them, because soon they will start doing their silently stare into the horizon thing. It must be some shaman thing from this ancient ethnic group!

Finally, once you have them, if forced total to them, be sociable! These Latinos will love to answer your questions about their country and culture, so remember to ask them where are they from! Sometimes your might have to ask many times, like from where really, or their family, or their ancestors. Also, ask them where did they get their lovely accent, where to get a nice maid to clean the house, or what is their favorite Mexican restaurant, no matter whether they come from Peru or Chile, everybody knows the Latinos all love Mexican food.

Good luck with your party, and make sure you pin it!

Working on Data Analysis

And then somebody tells me that the technique I am using was developed only three years ago, and you can’t find information about it.

Seriously? this thing is as old as WWII.

Taming the swarm

school of fishThe continuous attempts to tame the flood of information have a lot in common with detecting emergent patterns in complex systems: we look for a deterministic model that can explain the behavior detected in the large number of observations, present that model and observe how it stacks out against further observation that might or might not fit into the model.

Our theoretical thinking, though, is limited to the outliers that actually perform the creation of the models, the examination of the underlying assumptions and the critical review of the results presented.

Yes, the large majority of data scientists behave in a critical manner, explore within the limits of their modelling tools, and then use those limitations to explore further, refine and limit their model and observation parameters.

Those limitations, the constraints inherent to the production of a theory of the particular model under study and the limitations of imperfect knowledge and limited computing time move scientists to accept limited modeling techniques, as well as try to extend the metaphor of the underlying theory, effectively assuming the model used to study physical or economic events might use the limited assumptions of theoretical spaces within which the model is first derived.

For example, the linear model attempting to simulate and predict a situation in which there are at least four independent variables is going to fall short of explaining situations that arise in those variables, Furthermore, the assumption that the model is linear is also one that further reduces the validity of said model, since it is impossible to explain various nonlinear effects that might occur. The presence of higher order components will lead to interesting behavior that can be explained with many different mathematical tools, but definitely not within the framework of a first order linear model.

Seeing this happen in academia makes me think about the silo nature of research: models that could benefit from the hand of a mathematician or physicist are left unregarded, barely explained, while extra rigour is demanded of the most simple of models, in an attempt tom formalize and eliminate criticism due to methodology, when that methodology is not at fault here: the initial assumptions forget that the reality being described is perhaps better described using some other, higher order, dynamic description.

Meanwhile , we get to listen as migration is explained as dependent on happiness, while a whole other raft of environmental, economic and social factors are left unmentioned.

Another Fourth

These days that handle heat and patriotism, mixed with bubbles, street carnivals and overpriced flags made in China, are also reminders of subtler emotions, deeds and promises: we all read the books, and can almost TH understand the meaning and connect with the context, but history evades us amid the myriad distractions posted by the press, the friends and the parties. 4th of parties indeed, because we all became engrossed in enjoying a few days off, relaxing and for a couple days, allowing time to flow without a particular goal but that of letting the sun go down.

Simmer in the heat, allow the chair to creak and the ice melt, high pitched noises in the background squeals of delight and enjoyment, all along with the threat of storms and bugs happily ignored.

Tired feet, air conditioners on high, ice cream for once being a valid option, and a long walk home after a town festival, caring our thoughts about independence and rights as some cherished antiquities that inform our present hey need to be asked and aired far more often.

In the South and West, a Tax on Being Poor

tax dayEverywhere we see that Wilkinson’s book The Spirit Level applies; one of these studies points to regressive taxes as indicators of inequality and their higher associated costs: The NYT published an article about it: In the South and West, a Tax on Being Poor, and one of the paragraphs sums it succinctly:

For every $100 increase on taxes at the poverty line, we saw an additional 7 deaths and 78 property crimes per 100,000 people, and a quarter of a percentage point decrease in high school completion.

The taxes tell part of the story, but as with other indicators, they are only a glimpse into a system that skews towards affluent minority, while eliminating the social safety net and penalizing the lower percentiles in ways that eliminate mobility, increase morbidity and places additional costs on the social group.

Early on we were having discussion on the social contract, and how the perception of that contract allows people, constituents, to express their concerns and have actual results in a decent time frame, thus diminishing the pain that misguided policies might inflict. However, when the contract emphasizes the individual right over the social one, we see issues such as the ones presented in the NYT article.

Austerity has long proven to be an ineffectual tool, and the policies resulting on measures that resemble that are painful, regressive and costly.

Promotion vs. Prevention

time machine
And now it comes down to how the PM directs their team: Either by focusing on the final goal, identifying possibilities and taking risks in attaining that goal, or by making sure that the product managing process is taken care of, that the stakeholders all maintain proper communication through channels, and that all figures related to the development and testing process are in place.

HBR now presents that, depending on the manager focus, a product manager might too focused on a promotion or prevention style; it could easily be argued that a promotion focus serves better the startup PM, but it has also been shown that a prevention focus is particularly used by startups when dealing with the later stages of the product management cycle.

Quiz: What’s your style?


Do You Play to Win — or to Not Lose?.

Tagged with: , ,

On perceptions and paradigms as constraints

rails in de sneeuw
Good literature has this habit of turning assumptions on its head, explaining the limitations of perceptions and the ways in which we, as humans, maintain artificial sets of constraints even though we might be trying to fight those same obstacles. Pretty much the way Arendt was criticized, we lose our product vision, because precisely the way we have to fight the forces that push seemingly innocuous agendas on our plate. We have to regroup, reassess and act again.

These assumptions that so easily maintain the status quo, or impede the realization of our initial vision, also affect us, derailing our intentions and helping us, and others, into rationalizing the very same obstacles into being necessary.

For example, a process that has been unreliable, or a continuous rechecking and retesting, or a routine series of tasks that take resources yet are performed because that is the way that it has been done before, or because they are necessary to maintain data integrity on legacy systems, all seem to be valid uses of the time of developers and managers. The problem starts with reason for those routine tests and tasks in the first place: is the system getting into legacy mode? Are the input processes correct and updated? Is there a way to fix all these issues that pop up under use, and while the system is stress tested by changing business demands and requirements?

There is no correct answer here: the business evolves and changes requirements, the installed system becomes legacy, and resources and allocation are scarce; the legacy IT system is already established, the implementation of another system is dependent on business timing, i.e. you won’t change Point of Sale just before Christmas season, and recovering the existing knowledge to incorporate it into a new application is serious enough – old, legacy work is knowable, routine and easily measured, although prone to errors and mistakes.
And that is where the vision vanishes, replaced by routine task management and endless quality control tasks that are precisely what has launched the new system implementation in the first place.

But who manages that implementation, and how does that proceed?