Swarm Computing and Flex: 2010

Thursday, March 11, 2010

Debugging private variables in Flex

Introduction
Proudly brought to you from the dept of thuggery, buggery and skullduggery, here is a bit of a hack which enables you to access another classes private variables in a semi-controlled manner.

You want to do what?!?
Many out there should be shaking their heads right now. Yes, I agree, a belligerant attitude towards breaking a canonical axiom in Object Oriented programming rarely ends well, however, given the right situations, and preventative steps, it can be a useful way to get around certain... how shall we say... issues..

As always with dirty rotten hacks, let's investigate the rule we are breaking, followed by when you definitely should not break it.

Why private variables are good
Private variables help in preventing an anti-pattern known as Action at a Distance. If you just let any old programmer from any old class, access any old variable, and change it, you are opening yourself up for a real world of hurt.

Why private variables are bad
Most programmers follow Test Driven Development and a good Separation of Concerns philosophy. These two requirements can be at loggerheads at times, as although you want to shield your application from becoming a mess of spaghetti with a good selection of private and protected variables, when it comes to testing, the same protections that keep programmers out, keep your test framework out as well.

When, where and why you want to break encapsulation
My first year lecturer was a pretty cool dude who always wore a sarong, no shoes, and more than one occasion rode into our lecture theatre on his motor cycle.

We once asked him why you should not be able to access private variables from the outside, even from instances of the same class. His response, was for us to imagine that we are all instances of the class of students. We definitely should not be able to just start messing with each others privates. The discussion abruptly ended there.

But when would it be acceptable to access a private variable? The answer is, whenever your are performing diagnostics. It isn't a bad thing for your doctor to have access to private areas in the process of diagnosing a problem, is it? It is the same for your code.

Let's get down to code
We'll start off with the ArgumentFunctionPair object. This is basically an argument chain, followed by a function that's invoked outside passing the value out.

public class ArgumentFunctionPair
{
   public var argument:String;
   public var functionVar:Function;
   public function ArgumentFunctionPair(argument:String, functionVar:Function)
   {
      this.argument=argument;
      this.functionVar=functionVar;
   }
}

Here we have the visitInternal function that gets attached to your class that you wish to allow to have the visitor.

//reflective function to test properties and values for unit testing purposes
protected function visitInternal(errorFunction:Function, argFunctionPair:ArgumentFunctionPair):Boolean
{
   var integrityOK:Boolean=false;
   var returnObj:Object;
   try
   {
      returnObj = this[argFunctionPair.argument];
   }
   catch(e:Error)
   {
      errorFunction("ERROR: " + argFunctionPair.argument + " was not found.");
   }
   finally
   {
      argFunctionPair.functionVar(returnObj);
      return integrityOK;
   }
}

So why go throught this elaborate ruse?
It's all about preventing a visitor from changing a private internal. You never know when you'll get some sort of clever programming getting in there and thinking he can just sneak a change in when you're not looking.

With this sort of methodology, it makes it quite a bit harder to turn the function into a method to access the classes private variables.

Tuesday, March 2, 2010

Security in the Cloud - Avoiding the equivalent of an open relay

Few things strike terror in the hearts of seasoned administrators. An open relay is one of those things.

What is an open relay?

An open relay comes from the good old days of Internet, when mail servers would simply receive requests to send out messages to other mail servers. It was built on a model of trust, and was conceived before anyone saw a benefit to sending out messages regarding cheap places to buy viagra to completely random strangers.

An open relay costs big money. Not only due to bandwidth used, but due to blacklisting of your server due to spam reports.

So how does this apply to cloud?

I saw a demo from Dan Orlando on how to integrate an Adobe Air application with S3. Very cool stuff, and definitely worth looking into.

In summary, he's providing an example on how you can access your S3 bucket.

The first second I saw this application, I thought to myself. Awesome! This could be used to collaborate with files, to be able to send information to an online repository so that multiple people can access it.

Then I came to the realisation that each person would need to have a public/private access key for the bucket. This is not reasonable to do in a large system. I've researched the topic a bit and discussed with a few people, and they have a range of ideas, each one more terrifying to me than the next.

Why is this terrifying? Because you are basically giving a third party user the keys to your cloud platform. Why is this a problem?

Show me the money!

It is a problem. It doesn't take a genius to realise that giving a nefarious person access to a storage system to place files is just simply a bad idea. Why?

1. Phishing port

A place to store files on the internet, such as pictures etc. You take the heat, you pay for the bandwidth, they get to sell their Viagra/Tic Tacs.

2. Dark Side File store

From everything from WAREZ to undescribable images/movies that will get Interpol kicking down your door, your platform can be used for some pretty darn nasty stuff.

Let's not forget Trojans, virii etc.

3. Other access to Amazon

If they only gain access to your bucket, it isn't inconceivable that with your root key some black hat won't be able to spawn off a few EC2 servers to crack a few SSL key pairs or otherwise do numerous bits of distributed number crunching. After all, this is what Grid, er, I mean Cloud Computing was designed for!

Not to mention that you'll be getting quite a huge bill at the end of it all.

What should I NEVER do under any circumstances?

As Amazon quite clearly states. Give out, or embed your secret key in your Flex/AIR applications.

The problem is that to communicate directly from the client to S3, as far as I can tell you must have the private key included in the communication.

A RESTful state of unREST

In the good old days where programmers were programmers, and compilers were compilers, there was a level of automatic obfuscation present, as when you compiled your executable, it went to machine/byte code, and that was the end of it.

Unfortunately, with applications like Flex, all you need to do is to run the .SWF file through something like trillix, there is absolutely no way to keep your procedures secret. Including your authentication mechanism.

So let's say you run a login service in a Mid-Tier like JAVA, although it could be PHP, Ruby, whatever. Let's say that the user supplies their username/password to authenticate, and is then passed a private key to access a set of buckets. Problem solved, right?

Wrong.

There is no expiry on this key at this point in time, and it grants you complete and unabridged access to the bucket.

What are some possibilities for security?

Token Based Authenticated Security

If it would be possible to allow for a set of tokens, and a set of permissions to be placed to the files on S3, then that is ideal.

S3 is not about services that need to be allowed, but rather permission to read data. If you are blocked from reading the data, you have satisfied the security requirements. If you have access to the data where you should, and only where you should, you have met the user requirements.

I am unaware of a token based authentication service from AWS. If you could store your private key on your JAVA/PHP/whatever mid tier, securely held behind your standard security perimeter, you could then authenticate the username/password from the service. The service would call AWS, and register a token with a set expiry date, with a set of permissions.

The token would then be sent to the client, and the client could use the security priviliges which were previously authenticated.

Other

Ladies and Gentlemen of the Internet, please comment on any other ideas you might have on this topic. Also, watch this space, as I will be updating it as more information/AWS functionality comes to light.

Wednesday, February 24, 2010

Eventual Consistency vs Atomic Consistency

I was having a very interesting and neurally stimulating conversation with Brian Holmes who runs SmashedApples, a cloud computing framework for Flex, about Eventual Consistency and trusting that data is correct.

In my original post on swarm computing, I was discussing the use of shards over a distributed system. One thing that I hadn't considered was the possibility that you could write data to more than one source. I still held to the principle that when it comes to data, a system requires a single source of truth. After pondering this thought for a while, I have firmly come to the conclusion that you do not necessarily require a single source of truth.

What is Eventual Consistency?
Eventual consistency is the premise that eventually all copies of data will propagate and update themselves. I always had a problem with this. Most situations assume atomic consistency, or that once a write is made, any further reads will obtain the correct information.

This assumption of atomic consistency is a hang over from decades of programmers working with Relational Databases on single systems. In distributed collaborative systems, there is always a degree of lag. It comes down to how that lag is handled by the system.

The concept of Read Validation over Write Validation
We all have encountered and used write validation. By Write Validation, I mean that you cannot save the state of a system without passing all integrity checks.

Let's say that we have a payment form for a utility company, and we have a field that allows you to put in how much you want to pay on your bill. A write validation is that you can't put $-100 into the field and save the form. A read validation is that for submission you cannot have $-100 in the field, but you can save any and all values to the server.

Example 1 - A Holiday in Elbonia:
Let's say a fellow decides to go on holiday. Let's call him John Smith. John loves adventure, danger and generally experiencing life at the edge of civilization. In fact, much to the dismay of his twin sister Jane, John would have much preferred to be involved in anthropological activities on a National Geographic level rather than a computer programmer.

This is why John has decided to depart for the shores of the island of far away Elbonia.

To prevent his dear sister from suffering a heart attack from worry, John has promised Jane that he will provide her with regular updates as to where he is, and what he is doing. Hence, before checking in, John decides to update his facebook profile to say that he is at the airport.

Under the atomic consistency model, John would push the update to the server, the server would update, and anyone who would read his facebook page would immediately see he is at the airport.

Under the Eventual Consistency model under normal operation, both John and Jane would see the same thing within a mere number of seconds, as the clouds holding the data would be updated when the call is made.

But what would happen if the City Node was unable to talk to the central Facebook cloud?

John would see that his change has been accepted on his device, but his sister Jane will still see his previous status until the City cloud node that she sees has updated.

But what happens if multiple updates occur?

Let's assume that John is busy playing the latest game he downloaded to his portable device, and almost completely runs his battery dead. It is a long flight to the Balkans; especially if you fly with Elbonian airlines. All the way along, John is updating his status but his device cannot talk to the Internet as his device must be in Flight Mode.

When John arrives in Armpit, the capital of Elbonia, he updates his status to "Arrived At Customs", at which time all of his previous updates are sent.

Since the Internet is not so reliable in a fourth world country such as Elbonia, as luck would have it, the link to the outside world is currently down. After John leaves customs five hours later, he receives a frantic call from Jane, who says he hasn't updated his facebook status. John calmly explains that the internet is down in Elbonia, but he's heading for the hotel now. Just then, John's phone runs out of power.

After Jane's conversation ends with John, Jane logs in to Johns account, and updates his status to Going to the hotel.

When John returns to the hotel, he logs in to update his status. He still sees "Arrived At Customs", so he updates his status to "At Hotel".

An hour later the Internet in Elbonia is back up again, and the servers update. So what should everyone see?

An assumption must be made that the latest data is the most relevant. Each status change would have a timestamp. Jane's timestamp would be earlier than John's timestamp, so the servers can record the change to the record in the correct order. The latest status change would then be that John is "At Hotel".

Example 2 - Withdrawing cash in Elbonia:

After settling in to his hotel room, John decides that he wants to go out and buy some food, but he forgot to change money back at the airport. He needs some Elbonian Dollars($EB) or as colloquially known, some Grubniks.

Let's assume that there are two banks, the Bank of Elbonia (BoE), and the First National Bank of Elbonia (FNBE). Say that John has a bank account with WestBank back in civilization, with $101 in it.

Let's say John tries to withdraw $100 from the BoE, and $100 from FNBE immediately afterwards. Given the case of atomic consistency, John can successfully withdraw $100 from BoE, but not from FNBE, as he has insufficient funds. This is correct behaviour.

If we are using Eventual Consistency, under normal circumstances, the lag from one bank to the next will be far less than the time it takes for John to complete your transactions, let alone that John is able to get to another ATM and withdraw funds. So hence, John can successfully withdraw $100 from BoE, but not from FNBE.

Now lets say that Elbonia is having internet issues again, and the connection between BoE, FNBE and WestBank are down. This gives us a few scenarios.

Under the atomic consistency model, as BoE and FNBE cannot talk to Westbank, the transaction will fail. This is obviously not a good situation for the client, as they do not have legitimate access to their funds. This is because we are relying upon Write Validation.

Under the Eventual Consistency model, you will be able to withdraw $100 from BoE, which will record the $100 delta change on your account. Since the bank has not yet communicated the delta change, our intrepid explorer can successfully withdraw funds from both the Bank of Elbonia, and the First National Bank of Elbonia and wind up with a $101 balance. Right?

Well, yes and no. It's just a question of when. Let me first clarify that if John were to return to the BoE or the FNBoE, and try to withdraw money, he would be greeted with a $1 balance.

Eventually, the internet (even in Elbonia), will come back up. This is when Westbank receives the 2 deltas of a $100 withdrawl, and leaves our punter with a negative balance of -$99 in his account.

The advantage of using Read Validation in this circumstance is that John will be able to legitimately withdraw $100 from his bank account. The disadvantage is that John will be able to illegitimately withdraw a further $100 from his bank account, which will need to be resolved by business procedures and workflows later.

But I bet a few of you out there just like John might be thinking, hey, that's great! Why don't I just pull out a million bucks in cash, and then move to Paraguay? It comes down to the business rules of an ATM to prevent you from being able to do that. Remember that there is a maximum amount that can be withdrawn from an ATM at a given time, say $1000. Furthermore, there are only a set number of banks operating in a country.

Following this assumption, not many people will be withdrawing copious amounts of money given these circumstances, as generally Terms of Service will mean that the banks can communicate with each other in under a day in the worst of disasters.

Couple this to the high level of scrutiny that is placed to holders of bank accounts with the Anti Money Launder and Counter Terrorism Funding (AML/CTF) legislation that has been passed in many countries; You have a high level of assurance that the person exists and is legitimate. In any case, a shady character would not be able to glean enough from a system to make it worth their while to commit fraud.

This level of risk can be acceptable for some banks. But what if the bank that John transacts with does not wish to allow withdrawls if communication channels drop out? This can technically still be achieved.

Some objects of very high importance may still require a single source of truth to authorize transactions. Given this situation, you may only have partial functionality should a link go down. As in, John could check his balance from an Elbonian ATM, but could not actually withdraw any money.

There is another situation as well. Some bank accounts may allow for a certain amount of money to be passed out, but other bank accounts may not. This would merely be a piece of data associated with John's account to say that he can withdraw up to $800 per day, or $1000 or however much risk that both John and his bank are willing to accept.

In the end, in this example the Read Validation method provides more flexibility and customer service for the consumer, whilst still allowing for resolution of the problem in this area.

Example 3 - The disjointed timesheet
As always, on his first day on holidays, John gets a phone call from his boss, Bob. After the usual discussion of explosions, earthquakes, fires, flood and tidalwaves caused by a small show stopper defect in John's code that got shaken out in testing, John has to provide a fix.

Of course this is not that big a problem for John who implements the fix and updates SVN in under 8 hours, but now he has to update his timesheet. The timesheet application is cloud enabled, so he updates his timesheet, and is blissfully unaware that Elbonia has lost its connection to the outside world again.

John happens to be assigned to a task called "Release Defects" to which he places 8 hours on. This updates to the server.

In the meantime, the whole discussion about John in the office means that Dave, the resource manager, remembers that he had promised John that he would fill out his holiday timesheets for him. So, he opens John's timesheet and places 8 hours of work against his vacation task.

The timesheet has a business rule on it that says that no more than 12 hours of time can be assigned on a particular day. So how would the system resolve this situation?

Updates can occur on mistaken axioms. Dave had assumed that John had not placed any time on his timesheet. John had assumed that Dave had not placed any time on his timesheet. Both of them relied upon a mistaken belief. Dave thought that John has spent no time working, and John didn't know that Dave had updated his time.

This is why Read Validation must allow data that is invalid to the business rules to exist. Although there is currently 16 hours worth of work on John's timesheet, users should be able to enter more time. However, when John goes to submit his timesheet for approval, it shouldn't be allowed, as he has not satisfied the business rule of 12 hours.

So what would happen if Dave subsequently submitted John's timesheet?

There is a business rule that states that once a timesheet has been submitted for approval, it cannot be changed unless it has been unsubmitted.

So let's say that John added his time after Dave submitted his timesheet. In this case, it is clear cut that John should be informed that his timesheet had been submitted before John added his time. It would then be up to John to fix the situation.

If John added his changes before Dave submitted John's timesheet on his behalf, then the system should allow John's changes, roll back the submission, and inform Dave that there is a discrepancy of time.

But we need a new piece of technology here. We need to know when the update was made and what timestamp of data that the user had seen at the time. Each update needs to contain both pieces of information, so that the users can glean the reasoning behind how the decision was made to be able to resolve the conflict. Those familiar with Concurrent Versioning Systems (CVS) and Subversion (SVN) et al, will see that this kind of temporal dilation of information has been dealt with for quite a long time already, and it is an established principle.

In Conclusion
As my friend Brian Holmes always says, cloud computing is like as though every function is an island that can be calculated and the results be shipped out to its destination.

This is just like us humans. We are all islands that have limited vision of the world that we live in. It is not about being perfect in our universe, but rather about being fault tolerant.

Although Atomic Consistency is the ultimate in data integrity, in reality, we never truly have Atomic Consistency in life. We follow business rules and have elastic error validation to tolerate the False Axiom problem. This allows us to have Eventual Consistency, and when a data collision occurs, we can collaborate to resolve the issue correctly.

The Eventual Consistency model comes down to the level of trust that you place into the action, and the level of damage that can occur should an irreversible collision happen, such as withdrawing funds from an ATM whilst you have a negative balance.

This is a business decision, which can be made on a use case basis, so you can still allow for a single source of truth for a select number of transactions, where the business does not wish it to occur. This is exactly like human transactions.

If you go to a store, and their credit card facilities are down, and you don't have any cash, chances are, you are not walking out of there with the goods that you want. This is exactly the same situation with distributed transactions.

As for John, he has successfully conquered Elbonia and gotten paid for the day of his holiday that he had to work.

Meanwhile, Jane is back to worrying whether John is eating a healthy lunch and flossing regularly. No-one said that it's easy being John.

I on the other hand, would like to hear what your thoughts are on this topic.

Tuesday, February 16, 2010

A manifesto on swarm and utility based cloud computing

Introduction

So why is swarm and utility based cloud computing so interesting? The answer is because businesses wish to reduce cost, and increase scalability of their system. The only way to achieve this, is by removing the overhead of managing systems, and having them sit idle.

In my mind, the largest problem with distributed computing is a mindset shift. Right now, we are focusing mainly on pushing data to the cloud, with data size and transfer being left as an afterthought. My goal with a swarm based framework is to focus on moving the process, but not the data. The way I believe this can be achieved is to structure the storage strategy to optimise and limit the amount of data packet transfer, followed by moving of process. Execution of process is not a consideration at this point in time, as a client CPU is rarely a limiting factor.

The goal of this blog post series is to hold an open and frank discussion about how to best achieve the goals outlined here, and to avoid any further pitfalls that may arise.

There are several major considerations that I have thought out that need to be considered in a successful cloud or swarm framework.

Data Shards
The framework requires the data to be segmented into meaningful segments of knowledge, or "Shards". Basically a shard is a predefined set of data that has an explicit relationship dictated by the programmer. For instance, in a banking home loan system, a Person's contact details combined with their home loan information would be considered to belong to a particular shard of data.

A shard of data can belong to another shard. For example, the particular home loan data for a person can belong to a set of home loan data that belongs to a mortgage broker, as well as a bank. This becomes particularly useful for null dressing as well as Garbage collection, as you can explicitly pull information from the cloud, as well as delete objects that are referenced by shard. I will expand on this in a later post.

Using Data Shards to reduce the amount of movement of information
My thoughts on how a shard applies to limiting the movement of data goes as follows:

Moving data has 2 dimensions of expense. The amount of data, and how expensive it is to move the data. The former is self explanatory, but the latter becomes interesting with Flex 4's capability to handle peer to peer between clients. Take an office situation. Generally similar people tend to cluster together both geographically and from a network perspective that access similar types of data. For instance, in a bank, insurance brokers work on the same network, sales work in the same building, and mortgage brokers do the same.

It makes no sense that these guys should all be contacting the server to pull down the same data. This is where the concept of data gravity comes in. When a client accesses a piece of data, a counter is incremented as to the amount of times the particular shard has been accessed in total. An implicit total get tallied down the children from the root source shard.

This counter is used to rank the amount of carpet wear on a shard, making it more important, akin to increasing its mass. The client will cache the data according to level of importance, and discard and stop listening to changes occurring on the least important objects. Updating the data is another problem, but I think we will leave that discussion for later as well.

I have thought about naming this algorithm Data Gravity, as the importance of the object is akin to a planet increasing its mass, and clustering objects together in space.

This will make further sense when I expand on my ideas with the incorporation of Hyper Planes, Neural Networks and Convex hull structures with respect to Data Shard clustering in a future blog post and introduce the concept of a Gravity Well.

Private Clouds and Data Security

A large sales barrier to Cloud computing is data security and integrity. Private clouds seem to be gathering momentum. However, the ideal should be a mix of the two.

Non-sensitive data can be pushed to third party cloud vendors, while sensitive information should be prioritised by the company's own infrastructure.

What you pay for should be used as close to maximum capacity, while there is always option to load shed processes such as serving public web pages, public document storage and so forth to an external provider should your resources be overloaded.

The idea of adding a security clearance level to a data shard allows the private cloud framework to manage the distribution of data shards by servers and members. For instance, a server inside the organisation may be instructed to store top secret level shards of information down to public level access information, while say the Amazon or Google cloud may be set to store only public level access information.

A more complex security policy might add access to particular types of data shards. For instance, if you are the creator of a data shard, you would get access, even if it is top secret. Furthermore, you might provide other people direct access to the file on this level, as well as transitive access, by belonging to a team or group.

Data Recovery Management

DRM is a huge area that cost organisations a very pretty penny. It is also an area where if a pretty penny is not spent, should a risk materialise into an issue, it becomes a significant unexpected expense, or worse.

Cloud has brought forward a solution that is more like fools gold than anything else.

The problem is that on small scale amounts of data, we can definitely push it to the cloud for backup purposes, however, for larger scales of data, such as movie files or images etc, cloud storage is simply not an option.

Security is also a major issue here again, as it is very foolish for an organisation to push something like their complete client list, or their accounts to the cloud. Hence, these files tend not to be backed up, or stored in a more traditional DRM strategy such as on a file server, or copied to CD.

The problem here is that an organisation that moved their DRM reliance to the cloud will experience an atrophied traditional DRM strategy. In my days working as a systems administrator, I came across many sites where my predecessor diligently made backups to cd's or removable storage drives, but never tested whether they worked.

In such a circumstance, it becomes a dangerous placebo.

A swarm based DRM could be very interesting, in that the data movement and update solution can be applied to storing files in an encrypted fashion on a peer to peer basis in a private swarm cloud. This way, if a client is knocked out by disaster, or worse, a server is knocked out by disaster, a recovery process can automatically reconstruct data from a swarm of clients.

I will discuss this further in a later post.

Unidentified Issues or Benefits

At this point I would like to open the floor to any and all who read this post to provide thoughts and feedback on the ideas I have outlined above.

I feel that there are many untapped possibilities that have not been thought of yet in this space, and I would love to explore them with the community.

Swarm Computing and Flex