Cassandra Gift

Capture a relational model in Cassandra Source Code: https://bitbucket.org/johnmpage/gift/src/main/ How Gift Aligns Concerns The Cassandra database achieves scale by distributing data across multiple nodes. These nodes can be located on multiple servers. This distributed architecture comes at a cost; querying multiple servers takes time. Fast, efficient queries are best achieved by limiting the number of … Continue reading Cassandra Gift

Process Threads in Apache Tomcat vs AWS Lambda

Comparing Apache Tomcat threading to AWS Lambdas we see several points: Apache handles concurrent requests internally with a multi-threaded Java Virtual Machine (JVM). The JVM used by AWS Lambas does NOT allow multi-threading. Concurrent requests are handled by multiple Lambda instances Scaling with Apache Tomcat is achieved with multi-threading and load-balancing additional servers. AWS Lambdas … Continue reading Process Threads in Apache Tomcat vs AWS Lambda

SOLR indexes tend to be larger than the documents they index.

Examining the relative size of a data store and the size of the SOLR index of that data, one finds the size of the index is usually larger than the data indexed. This may seem counter-intuitive at first, but it actually makes perfect sense. In order to understand why, it's helpful to create a simplified … Continue reading SOLR indexes tend to be larger than the documents they index.

Weirdness when every function returns a Column: Chained when (Spark)

When when is chained, the chain breaks at the point that the test returns true. import org.apache.spark.sql.Column val isTrue = lit(true)def getWithChainedWhen():Column = { when(isTrue,"1st") .when(isTrue,"2nd") .when(isTrue,"3rd") }val df = sc.parallelize(List[(String)](("A"))) .toDF("a") .withColumn( "chained",getWithChainedWhen() ) .show(false) The results of running the above code is as follows: +---+-------+|a |chained|+---+-------+|A |1st |+---+-------+ Only the first when is … Continue reading Weirdness when every function returns a Column: Chained when (Spark)

Stream IIS logs to Kafka

Introducing KafkaTailer Kafka is a game-changer.  As a powerful, centralized messaging tool, it performs extraordinarily well compared to other messaging applications. Popular in the JVM-Nux-Nix realms, it is now possible to add your favorite Microsoft IIS application to your streaming pipeline.  Using the best open-source libraries available, KafkaTailer can stream your IIS logs to any … Continue reading Stream IIS logs to Kafka

Why We Tag

  Alternate Title: The Lynch Pin of Safe Patch Releases Some would argue Patch Releases to Production are inherently risky. In fact with the right approach the risk involved in a Patch Release can be small.  The key to managing this risk is Continuous Integration and a disciplined release process. When a bug is discovered in code that was released … Continue reading Why We Tag