August 7, 2016
I used docker at work a couple days ago, so here to give a short overview on what is docker and how to use it.What is Docker?Docker is a hot containerization platform which wraps all the dependencies in an execution system and is easy to pack and ship as you go. What makes Docker so popular? Co...
» Read More
March 7, 2016
Regular expression, sometimes abbreviated to Regex, is a sequence of characters that describes a search pattern in text. As a simple example, you want to list all the PNG images in a folder and the regular expression would be as easy as .+\\.png. Here . represent any character and + means 1 or mo...
» Read More
March 15, 2015
Version control is an important concept in Software Engineering because deleloping software is an accumulative work and you need to properly manage all the files and their changes. With a good Version Control System (VCS), you can compare changes over time or revert the entire project back to a p...
» Read More
February 11, 2015
Understanding PointersProgramming is concerned with manipulating data which is normally located in memory. Except using identifier to access data, in C++ programs there is another way to approach data which is based on its address in memory. In memory each variable has unique address. Pointer can...
» Read More
April 10, 2014
ProblemConvert a non-negative integer to its english words representation. Given input is guaranteed to be less than 231 - 1.SolutionAs we know, numbers in every 3 digits can be represented as thousand, million, billion, trillion, quadrillion … According to description, the number is no larger th...
» Read More
June 22, 2014
Spark is a lightning-fast cluster computing technology. Its in-memory cluster computing feature increases the processing speed of an application. Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark, which can be created in two ways: by referencing datasets in external st...
» Read More
March 21, 2017
Topic modelling is a technique to automatically discover the hidden topics in each document. The basic idea is that words with similar meaning will occur in similar documents. A document is then modeled and described by topics coverages with word distributions. Latent Semantic Indexing (LSI) and ...
» Read More
December 16, 2016
Word embeddings are dense vectors used to represent word meanings. Both Stanford’s GloVe and Google’s word2vec are open source packages and provide efficient implementations to train these vectors. Many pretrained vectors can be found online. I download the pretrained word vectors from the GloVe ...
» Read More
October 25, 2016
Distributed RepresentationIn basic NLP tasks, each word is treated as discrete atomic unit. The word vector is filled with 0s and a single 1 meaning word apprearance. It’s easy to see that this one-hot representation is very sparse whose dimensionality is as large as the vocabulary size. It can b...
» Read More
June 4, 2017
You might have heard of reinforcement learning, lots of its magical stories from media and are curious about what it is. Reinforcement learning is about getting an agent learn to act given rewards. RL is inspired by behavioral psychology. The process is just like teaching a pet: you don’t tell it...
» Read More