España: 80 años en 30 segundos.

Estos dias he estado mirando las estadisticas que provee el INE y el Gobierno de España y sus formatos. Cuantos mas datos aporten los gobiernos a sus ciudadanos, mas transparencia habra, eficiencia, participacion ciudadana, y a la vez mejorara la imagen de la politica y los politicos en nuestra sociedad.

Por lo general, me he quedado sorprendido por la cantidad de datos que ya existen, aunque el interfaz para conseguir los mismos y el hecho que de esten pre-formateados con ciertos campos dificulta bastante su uso. El futuro de la industria de la tecnologia de la informacion en la proxima decada estara en el procesado de grandes cantidades de datos y en la inteligencia que se extraiga de ellos, y pienso que todavia tenemos mucho camino que recorrer.

Como ejemplo de las cosas que se pueden hacer con los datos he cogido la informacion del PIB y la poblacion de las comunidades autonomas para ver como ha cambiado España en los ultimos 80 años, desde el 1930 hasta el 2010. Para completar los datos mas antiguos, tambien he utilizado las estadisticas historicas de la Fundacion BBVA.

El formato que he utilizado para visualizar los datos es el GraphMinder que tan buenos resultados ha dado a gente como Hans Roslin (http://www.gapminder.org/) en sus conocidas TedTalks y que se puede usar de manera gratuita con el Google Motion Chart Gadget.

Este es el resultado (dale play para ver la animacion y prueba a jugar con el tamaño, los colores, la estadisticas de cada eje, seleccionar comunidades autonomas y seguirlas en el tiempo, etc):

De este grafico me sorprenden varias cosas:
- las igualdades que habia entre comunidades autonomas a principios del siglo XX, tanto en poblacion como en PIB.
- el claro declibe del PIB durante y despues de la guerra civil española (mediados-finales de la decada de 1930).
- el repunte de la economia durante la era industrial de los 60 y 70.
- el paron a principios de los años 80.
- el vertiginoso crecimiento de la economia en las dos ultimas decadas.
- la gran distancia que existe ahora entre las grandes regiones de España y el resto, tanto en poblacion como en PIB.
- el cambio de liderazgo entre las comunidades autonomas, por ejemplo con autonomias como Castilla La Mancha, o Murcia adelantando varias posiciones.

A partir de este ejercicio, y con muy poco esfuerzo he aprendido y visualizado muchas cosas en poco tiempo. Seguro que vosotros encontrais mas detalles interesantes y maneras de usar e interpretar estos datos, por ejemplo siguiendo a tu comunidad autonoma favorita o comparando entre ellas. Tambien puedes hacer lo mismo con datos distintos que existen por Internet. si es asi, no dudeis en compartirlos y animo, que acaba de empezar la era de los datos abiertos!

on the Brain, the Internet, Privacy, Happiness, and why I am NOT closing this blog!

sometimes i feel like getting off the Internet. yes, you heard it right. i get up in the morning, check twitter, facebook, blogs, newspapers, email, RSS feed aggregators, etc, etc, etc, and i start feeling a sense of urgency, predestination, direction, expectation about what i should do next that bothers me. at times it feels as i have outsourced thinking to the cloud and with that, i have lost part of my freedom.

The Brain
when i think about my brain from a computer scientist point of view, i basically think about a memory, a CPU, and a bunch of algorithms that help process information, feelings, make decisions, structure the data, and more. if you take memory, it is becoming a commodity faster than you think. today you can buy at the supermarket a hard drive that has as much memory as your brain for $1000 (10 TBytes). CPU is also going the same path and becoming a commodity. for instance, computational capacity can be easily outsourced to a cluster of computers in the cloud. in 2020 you will be able to buy CPU equivalent to your brain capacity in the cloud for less than $1000.

In summary your empty brain CPU and brain memory cost no more than a couple of thousand dollars, and it is coming down… what makes our brain interesting is not the hardware — the memory and CPU–, but how we use it, what we keep, what we throw away, what we process, the connections that we make, etc. in essence the Software and algorithms we use.

The Internet
and you would think that the Internet and technology would help with this last bit, however, they can actually make our brain function less efficiently. our brain come become more interrupted by lots of small nuggets of info, sms, email, the web, advertising, thus, making our algorithms function less efficiently. in terms of the brain as a computing platform, this is creating a constant burst of short interrupts that prevents us from doing the long term thinking we need to be who we are. basically our brain is trashing all the time and only manages to keep some very superficial thoughts, trends and fashions.

TIME
i recently heard from Ferran Adria (elBulli) that the only difference between what he does and what other cooks do to become so creative is to make “time” to think and create. in this sense, it is the only restaurant in the world that closes 6 months a year to create. and now i think: when do you get off the internet to create and think. when do you close your brain from the world to create, do deep thinking and decide what you want to do and where you want to go next, to be in control of the boat…

i have been saying for a while that Hardware is becoming a commodity and Software is where the value is. well that is only partly true — i am also thinking that TIME is where the value is. Software translates into a number of algorithms that help your brain do two things: a) quickly filter out all the info that you are receiving in real time, keep whatever you think is useful, compare it with whatever you already know, summarize it and store it, and b) run background algorithms that help you do creative thinking, innovate, delete things you have duplicated, summarize things to different degrees of granularity, and create new connections of hunches which could become the next big idea.

The first set of algorithms act on real time and require agility and speed. The second ones, require TIME to process, in the same way that Google requires time to process Terabytes of information to help you find what you want.

PRIVACY and HAPPINESS
but more and more we happily give all our information away. we easily become naked on the Internet, share where we eat, what we do, where we are. the risk of this is that we may end up in a situation where with all this info, companies or governments will be able to steer you towards taking certain decisions. i fear a lot of people may be happy taking this road and we may end up living in a society like Huxley predicted in a Brave new World, where people became happy at the cost of not having any privacy or control of their life, just redeeming privacy in exchange of pleasures. i fear that as a society it is already too late to change things and that we are going that road faster than you think, but not as individuals.

and it is as an individual that i vindicate the joy of pleasure reading, of wasting time, of art, literature, philosophy, getting bored, taking time to do some thinking, of focusing on one single thing, of contemplation, of stepping back, of having a private life and decide when and why not making it public, understanding the cost and value of privacy, in summary, of being free.

if you don´t control technology, it will control you, and that is why i am not closing this blog, vindicate blogging as a way to reflect, talk about random things, and moving away from a “twitted” life — it may just take me a bit more TIME to update it :-)

Content Centric Networking:

It has been a while since I read a paper with so much hope, joy, and intrigue. I am talking about the latest work by Van Jacobson and his crew@Parc on Content Centric Networking (CCN). The paper was presented at Co-Next in Rome last month, which by the way is becoming a much stronger venue with more and more interesting pieces of innovative work. CCN is one of the best proposals I have seen in the Content Distribution space in the last decade. I guess the last time I felt this excited about a piece of work in the content distribution space was when I read the BitTorrent paper. CCN basically tries to democratize Content Distribution and re-design the Internet by placing content, and not machines, at its core.

Since the beginning, the Internet has been designed around communications between machines and IP. Most of the Internet traffic today, however, is caused by users retrieving content, and the Internet was not optimized for that (e.g. lack of multicast support). This has often resulted in wasted Internet resources where routers copy over and over again the same data packets or swamped servers. Over the years, we have overcome such problems with imaginative overlay solutions such as Web caching, Content Distribution Networks, and P2P networks, which have worked marvels but which have been also suffered the pitfalls of being after-thought around the original Internet design.

It was about time that someone took a stand and re-designed the Internet protocol stack placing content at the Internet’s core. In this regard, Van Jacobson’s effort makes a lot of sense and it is one of the most interesting proposals for a “Future Internet” design that I have seen (whatever that term means anymore…). Van Jacobson says that CCN is a Copernican revolution since it places content at the center and he hopes it will create the same impact as when the sun, and not the earth, was used as reference point to explain the universe.

CCN provides benefits in various fronts including better usage of Internet resources, location independent content routing, and content security and control. All this is great and could spark a number of innovations, research ideas, and new designs that can catapult this concept to the next level. As I write this post I am trying to clear my thoughts and identify which pieces of this work will have the most impact so it does not become an exercise of what could have been but has not, or drags on for ever in the standardization process as a “solution in search of a problem” as it has been the case for IPv6.

The first interesting part of the work is that it democratizes Content Distribution and ensures that anyone — not just those in position to pay a CDN– can enjoy the benefits of an Internet-broadcast service that amplifies your data whenever and wherever it is needed. With CCN, Content Distribution becomes a “public” service (in the European way) of the Internet. In a sense, P2P has done much of that, providing a public service which publishers can use to propagate their content to many users at very low cost. However, this has been done without taking ISPs into account and that causes various inefficiencies. Instead, CCN happens at the core of the ISP and thus it has more chances of succeeding. Nevertheless, there are already solutions out there for ISPs to deploy cache overlays and ISP-CDNs, thus, making content distribution more efficient for all. So far, whether an ISP deploys content-aware storage infrastructure or not has been an economic problem, and not so much a protocol problem. The decision of whether to deploy storage in the network has been a function of the ISP’s topology, workload, and various economic trade-offs (e.g. cost of bandwidth vs cost of deploying and operating storage nodes), but not the lack of technical elements to doing so. It would be great if CCN could lower the costs for ISPs to deploy storage in their networks, otherwise, an HTTP-based CDN is more likely to be the way to go for many years to come since the investment and the knowledge around it is already high.

The second interesting part of CCN is that it de-couples content and location and the mapping between content and location is done via routing. This is very important. Currently Google (or a similar search engine) does the mapping between keywords and content URL, and then the mapping between the content URL to the machine location (e.g. its IP address) is done via DNS. DNS has been one of the weakest points of the Internet in the last years, being the target of continuous DoS attacks and causing important Internet service disruptions and any solutions in this regard are welcomed. With CCN the DNS functionality is somehow embedded and distributed in each routing node, making it more resilient and scalable. Rather than trusting DNS to map host names to IP addresses, CCN avoids DNS all together and trusts content which can sit in any machine in the path to the data content and which can be retrieved from any cached copy along the path. This also provides very nice support for DTN-like communications where connectivity and arbitrary nodes can appear and disappear instantly. The drawback is that each router now has to do some more work to verify data and keep more state in its routing entries to route across name spaces rather than IP addresses. This can cause some scalability issues and potential DoS attacks, however, I am confident that this is solvable using various optimizations.

Another limitation of the current DNS service that CCN solves is that DNS only resolves host names (e.g. www.foo.com). However, it is not able to resolve different pieces of data under the same host name to different IP addresses (e.g. www.foo.com/file1.html and www.foo.com/file2.html). This limits the possibilities to download parts of content from nearby machines and to do multiple parallel downloads. Alternative approaches are to use different domain names for each file (e.g. www.foo1.com/file1.html and www.foo2.com/file2.html) or to use intercepting proxies with L7 switches, however, none of them are either very convenient because it requires rewriting the content, or deploying expensive hardware equipment. To me fixing the DNS limitations is likely to be one of the strongest selling points of CCN (as long as the extra costs at each router are low).

The third interesting portion of CCN is content security and control. Control and trust are part of the content itself, and not being a property of the IP connections it traverses. Given that any intermediate machine can reply with a cached copy along the path, content needs to be signed with a publishers certificate key and content routers need to verify that the content has been produced by its owner. This permits opening the network to wider participation, determining provenance, tracking where content has been in the network, and evidence based security where it becomes hard for an attacker to succeed in subverting a publisher by forging a fake content with the publisher’s key. Similar mechanisms have been implemented in secure P2P systems such as Microsoft’s Avalanche, and they can be key for CCN’s success. Revocation is also one of the major headaches of CDNs and secure P2P systems, and the current CCN proposal mentions this is part of future work. One last thing that CCN should support is to allow intermediate network nodes to become trusted sources so that they can modify the content as needed (e.g. re-encoding images to fit mobile phones). Both revocation and modifying content on-the-fly may complicate the current CCN design, however, both seem doable. The bigger question around CCN security is what is it different that one can do with CCN in terms of content protection and security that one cannot do protecting content at the application layer (e.g. DRM)? My guess is that provenance and traceability of the content are likely to be in the answer’s bucket.

As you can see, lots of questions but lots of excitement too. One final comment: I hope that it is not too late to see such a clean content networking solution move forward given the plethora of alternative solutions already out there (e.g. CDNs and P2P). The inertia could also be such that by the time something similar to CCN gets deployed on the Internet, the Internet has already changed focus again, say from content networking to video conferencing. Then, it would really feel like we are chasing an evasive ghost, e.g. we design for machine communications and there comes content, we design for content and there comes conferencing, etc…. Ah! one last thing, while reading the CCN paper it came to my mind that it is about time that Google starts doing page rank using content signatures (e.g. Rabin fingerprints) to solve the content aliasing problem: using links is so broken!! :-)

My genetics results: 23andMe

This week I got my genetics results from 23andme. I have been fascinated since. You basically send a saliva sample and they decipher your DNA genome in about three weeks. Then, they pass a number of analytics on your DNA and tell you a bunch of interesting things about you: e.g. where your family comes from, what illness you are likely to contract, or what you could like the most. For instance, it seems that I have higher changes than an average person to get prostate cancer in the future, the same risk than the average to get arthritis, and less risk than the average to get diabetes. The analysis also tells me that I am unlikely to become bold (which I am not), that my eyes are likely brown/green (which they are), and that my muscles are better for sprint running rather than endurance (which is very of true). I would take all these things with a pinch of salt since they statistics after all and a lot of things depend on the style of life you have, but it has made me think about a few things to watch out for in the future. For instance, one of the first things I did was to look for the G2019S mutation in gene LRRK2 which is known to cause high changes of Parkinson disease (see recent post
by Sergey Brin talking about his case). I don’t seem to carry this mutation, and actually my risks of having Parkinson disease are much smaller than the average, however, I will keep an eye for other risky mutations.

Regarding my family heritage, I am mitochondrial maternal DNA haplogroup H3 and paternal Y chromosome R1b. This translates into having family ancestors from my mum’s genetic line which come mostly from Asturias in the north of Spain and that is the same genetic line which later spread over the south of England. Regarding my dad’s DNA line it seems I am a lot more mixed, with Germanic, African, and Irish influences.

There have been similar efforts like the genographic project by National Geographic (photo above). However, the interesting thing here is that as scientists develop more analytics they will be able to tell you a lot more things about you: e.g. what is the probability that you will marry soon vs date for a long time, the best medicine drug for you, or what foods will give you more joy. This can do for a great gift for your family or friends to know how many genes you got from your dad vs your mum, or what are the chances that your kids will be blonde or have how blue eyes… Welcome to the future!

You and your Research (Hamming)

I just finished reading the transcript of a talk by Hamming while at Bell-Labs (1986), which includes some great advices on research and work spirit overall.

http://www.chris-lott.org/misc/kaiser.html

These are some extracts:

‘ … each of you has one life to live. Even if you believe in reincarnation it doesn’t do you any good from one life to the next! Why shouldn’t you do significant things in this one life, however you define significant?

I find that the major objection is that people think great science is done by luck …. And I will cite Pasteur who said, “Luck favors the prepared mind.” … The prepared mind sooner or later finds something important and does it. So yes, it is luck. The particular thing you do is luck, but that you do something is NOT … Newton said, “If others would think as hard as I did, then they would get similar results.”

One of the characteristics of successful scientists is having courage. Once you get your courage up and believe that you can do important problems, then you can. If you think you can’t, almost surely you are not going to.

“Knowledge and productivity are like compound interest.” Given two people of approximately the same ability and one person who works ten percent more than the other, the latter will more than twice outproduce the former. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity ­ it is very much like compound interest. I don’t want to give you a rate, but it is a very high rate. Given two people with exactly the same ability, the one person who manages day in and day out to get in one more hour of thinking will be tremendously more productive over a lifetime.

Great scientists tolerate ambiguity very well. If you believe too much you’ll never notice the flaws; if you doubt too much you won’t get started. It requires a lovely balance.

For those who don’t get committed to their current problem, the subconscious goofs off on other things and doesn’t produce the big result. So the way to manage yourself is that when you have a real important problem you don’t let anything else get the center of your attention ­ you keep your thoughts on the problem. Keep your subconscious starved so it has to work on your problem, so you can sleep peacefully and get the answer in the morning, free.

If you do not work on an important problem, it’s unlikely you’ll do important work. It’s perfectly obvious.

I have now come down to a topic which is very distasteful; it is not sufficient to do a job, you have to sell it. `Selling’ to a scientist is an awkward thing to do. It’s very ugly; you shouldn’t have to do it. The world is supposed to be waiting, and when you do something great, they should rush out and welcome it. But the fact is everyone is busy with their own work. You must present it so well that they will set aside what they are doing, look at what you’ve done, read it, and come back and say, “Yes, that was good.” ‘

The *Trust* Economy

Clearly what has happened over the last days with the financial markets will give people lots of things to think about. Some will blame it on capitalism, others on politicians, others in human greedy nature, others in corruption and reckless financial engineering, others in the lack of regulation…

While all the above has had an impact, I believe that the root of the problem has been the failure of the trust and reputation tools used to determines the risk of financial assets. In the 21st century, we do not yet know how to quantify *trust* in large uncontrolled environments (e.g. where financial assets get recombined all the time). In fact, the world has become so complex that we do not even know how to measure the value of what we create! … and this generates bubble after bubble after bubble.

And you may wonder why is it that I am talking about this? Well, because the Internet also suffered/suffers from similar problems, i.e. that of trust and reputation, and some Internet tools may help here. The Internet is too a large scale, unregulated distributed system, where information appears and gets mixed and re-mixed all the time (web pages, content aggregators, blogs, microblogging, etc). Before Google, it was rather cumbersome to find what you wanted and trust that it was the best page for that topic. But they provided a simple yet scalable way to evaluate the rank of a page, and even more, they managed to monetize such ranking information through ads — charging advertisers that want to appear close to highly ranked pages–, thus, probably creating the first bank and currency of the new trust economy.

 

 

As we move forward, reputation and trust will be among the most important things you will care about. After all, your time is finite and probably your most precious asset; and finding trusted goods is very time consuming. I would personally value being able to find a trusted doctor quickly, a reputed travel agency that offers me a good vacation package, or a good fund that won’t be full of subprime loans. And the same goes for companies, which will care about what is their reputation online at any point in time (e.g. right after launching a new product). So who will become the wall street of this new trust economy? Google? Facebook? Telcos?. Actually Telcos are in a very good position to become a bank of trust given that a lot of our online transactions happen through them.

Another problem with measuring trust is how to do it at scale. When a bank wants to lend to another bank or individual, it needs to determine the financial risk involve in that, however, with millions of customers and investments it is very hard to do it well. P2P has been very successful at scaling the delivery of information, so what about P2P lending and P2P reputation? This way, you scale the risk assessment process much better: each potential lender evaluates the risk of each potential borrower and rather than borrowing from the bank, you borrow from your neighbor. Done well, you shouldn´t have to pay higher interests than those offered by banks, and as a lender your risk should be lower. See CNN article on Peer-to-Peer lending

So I would say we pay less attention to the housing and information/IT economies and we focus in the new Trust economy

How 3D printers could change our lives (and create new opportunities for Telcos)?

3D printers present a whole new range of opportunities for users and telecommunications companies. They could also completely shape the landscape of factories and shops as we know them today.

They look like basic printers (a bit bigger) and they can produce a 3D object from a digital model of the object by laying down layers after layers of a special material until complete. You can use different materials, ranging from polymers, titanium, or even gold powder.

What could you print? I can think of things such as industrial components (pipes, parts for cars), clothes (e.g. shoes), furniture, jewelry, and why not, chip designs for electronics, and food!! (see this article for a printer that produces sugar objects)

For now they are mostly used to build models for architects and fashion designers, and they are a bit slow, but you can imagine how the technology could improve over the years to come.

 

I first saw one working last year at the Renacer conference and since then I have been thinking about their possible implications.

How many times you have waited for a product that is out of stock? What if you could just download a detailed digital design of the product and have it printed at home?

At that point, a lot of factories and shops could well disappear! Everything would be intellectual property and data flowing around. We would just spend time thinking and designing, not so much doing hand labor. Finally, human kind would be freed to do what they can do best, thinking. That would be a revolution!

And for Telcos and networking companies that would be a great opportunity too. Imagine how many terabytes of data would need to be shipped from one corner of the world to another to describe with the finest level of detail a given product so that the printer could build it. Huge volumes of data would be flowing from designers directly to user’s homes, and that would need to happen in a timely manner. We would be talking about shipping bits, not physical goods anymore, and Telcos would then become the FedEx of the Internet! Who said that networking was a dead field? J

 

For more info you can also see this Economist article.

On unlocking the iphone…

Has the iphone really been hacked or just a “particular software” version of the software has been compromised? For the latest versions of the software you are basically temporarily (or permanently) stuck. If you want a more permanent solution, you need to go into painful hardware-based solutions.

The reason I am saying this is because a friend of mine recently bought an iphone in the US hoping to use it in Europe with some unlocking software and give it as a Xmas present. However, the iphone is still sitting in the box hoping that somebody breaks the new bootloader (see this blog for some efforts related to this http://11246unlock.com/index.asp).

Even if somebody manages to unlock the latest software (which surely will happen eventually), the rate at which iphone software versions are being hacked is slowing down and Apple could easily keep turning the screw releasing new functionalities more often and making life harder and harder. And things could get worse if Apple decides to use some sort of revocation system, e.g. similar to those used in many DRM systems. With DRM, content owners or distributors can revoke access to all previous hacked DRM software versions forcing you to keep your deviced updated.

So I guess, after too much hassle, hackers having proved their point will just give up, and eventually consumers will do to.

Japanese Food…

It is a while I wanted to talk about Japanese food. I think it exemplifies very well for the title of this blog — “keep it sweet and simple”. And I am not just talking about sushi, but the large variety of small, delicately cooked dishes, which are put together in little plates and beautifully decorated bawls.

When presented with the food, you feel a bit like an orchestra director, with lots of different instruments to play with, or rather, different foods to try. You can combine different textures, sweeter or spicier tastes, in whatever order you fancy, and then whenever you want, you come back to the sticky rice, or the delicious soup, which set the beat and average out your pallet for the next food composition. And the fascinating thing is that in each round you can try something different.

This is very different to traditional European food, where courses come one after the other and there is very little degree of flexibility on how and in which order you eat things. Ah… and a very important thing: regardless of how much you eat, it always sits very well with you, making you enjoy it for hours to come– what good is it excellent food if it leaves you with a heavy and painful evening…?

 

 

Brief thoughts on Spain

Ok… new year and new twist to this blog. I have decided that from time to time, I will expand the topics and briefly talk about other things such as politics, food, and other interesting random things that I bump into (i.e. not just research).

So here is the first one: since I moved back to Spain I have noticed how much effort we spend discussing about issues such as terrorism or national identity, which although extremely important, are likely diverting a lot of the focus and energy needed to tackle some other main challenges that Spain faces over the coming years.

I have recently read two articles, which I believe crystallize very well some of these problems (e.g. education, labor market, culture). In general, a lot of similar things could be said for most of the southern EU countries, not just Spain.

The second transition (the Economist, 2008):
On the challenges that Spain faces over the long run and how to avoid a “gentle decline”.
[Article]

Locals vs Cosmopolitans (Xavier Sala i Martin, La Vanguardia, 2007):
Extremely well written article on how the world can be viewed both from a local or a global point of view, and the challenges that a country/region faces when globalization hits in and you still think locally.
[Spanish Article]
[Google Translated]

two girls one cup 2 girls 1 cup 2 girls one cup

Warning: (null)() [ref.outcontrol]: output handler 'ob_gzhandler' cannot be used twice in Unknown on line 0