Saturday, October 30, 2010

Thrift for serializing/deserializing objects in Membase

First off, if you haven't heard of Membase, you should check it out. It's an evolution of sorts from memcached.

Typically when you use memcache or membase to store/retrieve key-value data, the value part is not a simple datatype. Instead, it would most likely be a serialized representation of some complex application specific data-structure. It's great to set/get complex datastructure with a single remote call like this. But what could become a problem very soon is the performance of the serialize/deserialize operations that needs to happen with set/get operations.

With php, the obvious way to do this is to use the language's builtin serialization facility. Since the serialized format is a ASCII based format, I would guess that it's performance is not optimal (especially for deserialization). Also, one would want to do compression to reduce the data transfer and storage costs. This again adds to the set/get operation costs.

I'm looking at one such application which could be optimized to work more efficiently in these areas. I've looked at Google Protocol Buffers. It's very easy to understand and use and has very good documentation. Unfortunately it doesn't have good support for PHP. So I'm now looking at Thrift. Thrift was initially developed by Facebook for use primarily with PHP and other languages. So it has good support for PHP and has comparable performance and functionality to that of protobufs. But it's documentation seems to be too sparse.

On Compression
LZO compression is a more suitable compression algorithm for reasons of CPU and memory efficiency. When compression is used as part of a web request handling, one has to carefully do the trade-off between compression size and speed.
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Furl
  • Reddit
  • Spurl
  • StumbleUpon
  • Technorati

Sunday, April 04, 2010

Need for service with guarantee of security and privacy

Current Situation
With all of the online communication services like e-mail, social media sites we use today, pretty much all of them are "free" services in the sense that we don't pay them any subscription fee for using the service. And as such the "Terms of Service" are heavily tilted towards the service provider.

In most cases, the only way such free service provider makes money is by mining the data they collect when we use the service. Every time we use such service we are inputting some data for query, transmission or storage. Most of the time this data is sensitive, confidential, private data like your contacts, personal messages that reveal who you are, what you like or don't like, what, when and where you do things etc.

By mining this information for profiling the user and using it to show targeted ads or to do market demographics research and sell that information to marketers are most common ways of making money. In such cases, no particular user's data is specifically exposed as it's all aggregate information. So such uses may be acceptable.

But what is scary with this is unauthorized, accidental data leakage or theft of data by illegal "hackers" or even government powered agencies getting access to this data to spy on people or corporate espionage.

What's needed?
I don't know if there ever will be a complete solution to this problem. But, to start with we need guarantees about privacy and security from service providers. It should be verified and certified by multiple 3rdparty agencies. It should be scientifically provable. And there should be stringent consequences for breaching this guarantee irrespective of whatever the reason may be.

Now, I know such security is difficult and will cost a lot of money. So, it is acceptable to have subscription fees to cover such services.

What is alarming today is there are absolutely no such services in existence. Even if someone values their privacy and security and are willing to pay subscription fees for it, they have no choice but to use ads powered free services.
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Furl
  • Reddit
  • Spurl
  • StumbleUpon
  • Technorati

Sunday, February 14, 2010

Flash video sucks!

Summer is almost here. With the rising ambient temperature, my macbook gets hot sooner.
Especially when my browser is open it gets hotter sooner. The reason is the all pervading Adobe Flash player based ads or video players on web pages.

This has made the experience of watching videos on youtube or ted.com an unpleasant experience. If I watch the video in full-screen, I notice both cores on this macbook doing full 100%. That's horribly wrong when it only needs less than 1% when I play the same video via a standalone video player like VLC.

This really needs to be fixed. Something is horribly broken here. Is this just me or everyone else simply putting up with this problem?
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Furl
  • Reddit
  • Spurl
  • StumbleUpon
  • Technorati

Friday, January 15, 2010

Google finally getting into data backup!?!

With their latest announcement to host any type of files on Google Docs, Google is foraying into the arena of "your data in the cloud, access, organize, share - anytime from anywhere" business that we have been envisioning from a long time (over 4 years now!).

What's interesting is the approach that google has taken. Instead of traditional approach of building all the features that are geared towards providing this product vision from ground up and releasing the end product, Google has built seemingly independent product and tested the waters first. And once the users have accepted each of those individual pieces reasonably well, they are integrating them all to provide a powerful experience. (Privacy conspiracy theorist may say this is much like boiling a frog in the water slowly!).

Interestingly enough, Google's price of storage per GB seems to be the cheapest at the moment at $0.25/GB/year. But their initial free offering is just 1 GB with 250MB file size limit. At this price, it seems cheaper than amazon. And as expected for a end user product, there are no transfer charges (bandwidth costs). In comparison, Microsoft SkyDrive offers 25 GB free space with 50MB file size limit.

Even though Google doesn't have it's own backup client that can run on your desktop like traditional backup clients, I'm sure, given their good data apis available to 3rd party developers, there will be many cropping up like mushrooms.

Surely this will change the market for good in the long term. Let's see how the traditional backup companies (including us) will react to this.
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Furl
  • Reddit
  • Spurl
  • StumbleUpon
  • Technorati

Saturday, January 09, 2010

Macbook Pro Battery

One thing I realized with my last Macbook (White) is that putting your macbook to sleep all the time and never shutting it down (especially overnight) is not good for your battery. By doing that I've had consumed more battery cycles and now the battery discharge time has come down to just over 2 hours. Last week I got a new macbook pro. And this time I figured out how to make it hibernate (not sleep) upon closing the lid. With that, I've managed to do only 3 cycles of battery recharge in last one week.

Here's how to put your macbook to deep-sleep (hibernate):
Put these two lines in your ~/.bash_profile:

alias hibernateoff='sudo pmset -a hibernatemode 0'
alias hibernateon='sudo pmset -a hibernatemode 5'

And whenever you are about to close your lid (like before you go to bed), just turn hibernate on by invoking hibernateon in terminal. At other times, when you don't want it to go to deep-sleep, just turn hibernate off. This is useful when you close your lid while moving between meeting rooms etc in office.

Also, I realized that these new batteries don't need to be discharged and recharged regularly as they don't have the "memory" problem like the older technology batteries did. So I use battery only when I need to and stay on power adaptor when I can. This way I can keep my battery cycle count low.

Here's my battery info for future (self-) reference:
+-o AppleSmartBattery  
    {
      "ExternalConnected" = Yes
      "TimeRemaining" = 0
      "InstantTimeToEmpty" = 65535
      "ExternalChargeCapable" = Yes
      "CellVoltage" = (4189,4189,4190,0)
      "PermanentFailureStatus" = 0
      "BatteryInvalidWakeSeconds" = 30
      "AdapterInfo" = 0
      "MaxCapacity" = 5573
      "Voltage" = 12568
      "Quick Poll" = No
      "Manufacturer" = "DP"
      "Location" = 0
      "CurrentCapacity" = 5573
      "LegacyBatteryInfo" = {"Amperage"=226,"Flags"=5,"Capacity"=5573,"Current"=5573,"Voltage"=12568,"Cycle Count"=3}
      "BatteryInstalled" = Yes
      "FirmwareSerialNumber" = 9626
      "CycleCount" = 3
      "AvgTimeToFull" = 0
      "DesignCapacity" = 5450
      "ManufactureDate" = 15124
      "BatterySerialNumber" = "xxxxxxxxxxxx"
      "PostDischargeWaitSeconds" = 120
      "Temperature" = 3099
      "InstantAmperage" = 0
      "ManufacturerData" = <000000000000000000000000xxxxxxxxxx000000000000000>
      "MaxErr" = 1
      "FullyCharged" = Yes
      "DeviceName" = "xxxxxxxxx"
      "IOGeneralInterest" = "IOCommand is not serializable"
      "Amperage" = 226
      "IsCharging" = No
      "DesignCycleCount9C" = 1000
      "PostChargeWaitSeconds" = 120
      "AvgTimeToEmpty" = 65535
    }


As this battery is deigned to last 1000 cycles I'm hoping this battery will give me 6hrs backup when I need it for a long long time - at least 3 years.
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Furl
  • Reddit
  • Spurl
  • StumbleUpon
  • Technorati

Thunderbird 3: Better search user-experience, not there yet.

For my work e-mail, I've used Microsoft Outlook with Exchange server for 2 years and I liked it a lot. Especially the global address book integration, expanding distribution lists, calendar/meeting scheduling features are awesome. In my current workplace, we don't have Exchange. And I'm on a macbook. So I've a choice of Apple Mail or Thunderbird or Microsoft Entourage.
Tried Microsoft Entourage - didn't like it - it's nothing like outlook and it's UI is as if it's been resurrected from 1970s. And without exchange server to connect with, it doesn't have much advantage compared to others.

I tried Apple mail also for a couple of months. Didn't like it either. Although it looks great compared to entourage or thunderbird, it isn't great for handling lots of e-mails in lots of IMAP folders. It's search is also lacking in speed.

Then I tried Thunderbird. I've been a big fan of Mozilla for a long time. And being a supporter of open-source (where appropriate!), I decided that I could put up with minor quirks here and there with Thunderbird and woud use it as my primary mail client. And so I've been using it for past 2 years.

Over the years, Thunderbird has improved quite significantly. Especially it's ability to handle huge number of e-mails in huge number IMAP folders is great. It's search is also quite fast. Although there have been lots of crashes (as I'm always on beta or even alpha builds, that's expected), the latest Thunderbird 3 release has been quite stable. No crashes so far. So overall I'm happy.

But I think thunderbird can do much better with just a few minor improvements. Here's my list of low-hanging-fruit enhancements to thunderbird that can greatly improve it's UX.

  • Keyboard accelerator or special keywords (search operators) that maps to search filters in the quick search drop down. This would speed up the search experience in a big way. I've filed this as a enhancement request in the thunderbird bug tracking. https://bugzilla.mozilla.org/show_bug.cgi?id=538738. Please leave your comment there if you also think it's important.
  • Multiple addresses in a single line in the compose window. This is annoying when we are replying to a message having a lot of recipients. Here's the enhancement request for this one. https://bugzilla.mozilla.org/show_bug.cgi?id=495241
  • In thread view, when a new message arrives, if the thread is collapsed, it should be shown in bold to indicate there is a unread message hidden there. Otherwise, the user may miss reading the message.
If you are a thunderbird hacker, please consider working on this. I myself would like to spend time on this. Maybe with jetpack for thunderbird, this may be a simple jetpack to get both these things done.
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Furl
  • Reddit
  • Spurl
  • StumbleUpon
  • Technorati