Apache Spark offers many advantages for startups

Apache® Spark™[1] is a distributed computing framework with in-memory processing to speed analytic applications up to 100 times faster compared to current technologies such as Hadoop[2]. Actually, Hadoop was the leading open source Big Data framework but recently the newer and more advanced Spark has become more popular, although they do not perform exactly the same tasks, and they are not mutually exclusive, as they are able to work together. Apache Spark can help reduce data interaction complexity, increase processing speed.

As it is written above, Spark’s fast computing ability with Big data is one of the biggest merits of using it. In addition to this merit, from the perspective of an engineer working for a startup company, Spark offers  many advantages for startups.

1, Flexibility of options for data storage
For a distributed storage, Spark can interface with a wide range of systems: Including Hadoop Distributed File System (HDFS), MapR File System (MapR-FS), Amazon S3. A customised solution can also be implemented despite that Hadoop can only be run on HDFS. Especially, Amazon S3[3] would be a reasonable option because of its pricing, redundancy, system extensibility for startup companies.

2, Flexibility of options for programming
Hadoop is implemented in Java. For providing data summarization, query, and analysis, in addition to Java, there are several systems infrastructures built on top of Hadoop.

・HiveQL(Hadoop+Hive)
・Pig(Hadoop+Pig)
・Hadoop Streaming

As it is shown, manipulating Hadoop often requires other softwares, and this could introduce further complexities and difficulties for maintenance.

On the other hand, Spark can be manipulated with several programming languages, although Spark is implemented in Scala.

・Scala
・Java
・Python
・R

Moreover, Spark SQL that is an API of Spark, let users execute SQL queries written using either a basic SQL syntax to data manipulation.

This flexibility would be helpful for a startup company that has difficulty to find an engineer who is familiar with a particular programming language, although using the API mentioned above lower processing speed because of its computing overhead.

3, Open-source
Last but not least, Spark is one of the most active open source big data projects and it is free. Startup companies often have financial limitation. Therefore, it seems to be a good starting point to use.

[1] Apache Hadoop [http://hadoop.apache.org/]

[2] Apache Spark [http://spark.apache.org/]

[3 ]Amazon Simple Storage Service (Amazon S3)[https://aws.amazon.com/s3/]

Develop our Chinese comingsoon page and Angular JS

This month, I mainly focused on developing the v.2 of coming-soon page for our innovative consumer app. The specifics of this task is to replace the map for a timeline, change the ads at the sidebar to the bottom of the web page and insert an updated input form in the subscription section. The implementation of the timeline mainly depends on Hint.css tooltip library. The input form is extended by adding more text field and radio button. The reason that I did not choose datalist is because of the limitation in customizing the style of datalist.

My next task was to develop a Chinese version of this website. To my surprise, one of the main challenges was to ensure the translation of the original English descriptions would be understood easily from Chinese users’ perceptions.

Here are the links to my work: http://comingsoon.hellouni.mobi/ and http://china.hellouni.mobi/

After finishing the revisions of the two websites, I started to study Angular JS, which is a structural framework for dynamic web apps. It could let developers use HTML as a template language and let them extend HTML’s syntax to express their application’s components clearly and succinctly. Angular’s data binding and dependency injection eliminate much of the codes that developers would otherwise have to write. And it all happens within the browser, making it an ideal partner with any server technology.HTML is a great declarative language for static documents, but it does not contain much in the way of creating applications. The impedance mismatch between dynamic applications and static documents is often solved with:

1. a library – a collection of functions which are useful when writing web apps. New code is in charge, and it calls into the library when it sees fit. E.g., jQuery. 

2. frameworks – a particular implementation of a web application, where your code fills in the details. The framework is in charge, and it calls into new code when it needs something app-specific. E.g., durandal, ember, etc.

In contrast, Angular JS takes a different approach. It attempts to minimize the impedance mismatch between document centric HTML and what an application needs by creating new HTML constructs. Angular teaches the browser new syntax through a construct called directives. It is purely a client-side solution.

As a beginner, I have not yet become familiar with it. Luckily, Olivia found one of her friends who is an expert in using AngularJS and he has given an introductory overview of theAngular. JS . I found his presentation was helpful.

Backend developer – the one who joins all these bits and pieces together

 

Ok, so there you have it, a very simplistic high level view of the system I am working on at ICAN. But there is much more to it…

As a backend developer I spend most of my time and effort on improving database performance and extending server functionality. The server is like a boss of the whole structure, other components are organised around it and obey its rules. It dictates how the data flow through the system and who is given the access to a specific information. There are multiple security layers in every component and I make sure that each technology used is up to date, so that all data is safe.

The system was made with scalability in mind and it is maintained in the same fashion. As a startup, we are always ready for rapid growth and big bursts of new users, so is our technology. The whole product is hosted in the Amazon’s cloud (AWS), which is very elastic. What it means is that Amazon provides a virtualised infrastructure which we use to deploy our services and whenever there is a need for more data storage or computational power, it is just a matter of several clicks to increase the capacity of the resources.

Targeting international users, the service has to be available across the world and allow for a fast connection from different locations. It is not trivial. What might be surprising is the fact that responsiveness of a machine which is far away, say on the other part of the globe, is limited by a speed of light. As you might guess, we cannot beat that, but there are ways to go around it and to give users a great real-time experience.

I cannot stress enough the ‘developer’ bit of being a backend developer at ICAN. It is far from just maintaining the finished product. It is about coming up with innovative ideas, supporting other members of the team and at the end of the day making the product ever better.