Tranquil

  • Docker on Load

    I used docker at work a couple days ago, so here to give a short overview on what is docker and how to use it.What is Docker?Docker is a hot containerization platform which wraps all the dependencies in an execution system and is easy to pack and ship as you go. What makes Docker so popular? Co...

    » Read More

  • Regular Expression in C++11

    Regular expression, sometimes abbreviated to Regex, is a sequence of characters that describes a search pattern in text. As a simple example, you want to list all the PNG images in a folder and the regular expression would be as easy as .+\\.png. Here . represent any character and + means 1 or mo...

    » Read More

  • Git Commands Cheetsheet

    Version control is an important concept in Software Engineering because deleloping software is an accumulative work and you need to properly manage all the files and their changes. With a good Version Control System (VCS), you can compare changes over time or revert the entire project back to a p...

    » Read More

  • C++ Pointer Quick Tutorial

    Understanding PointersProgramming is concerned with manipulating data which is normally located in memory. Except using identifier to access data, in C++ programs there is another way to approach data which is based on its address in memory. In memory each variable has unique address. Pointer can...

    » Read More

    c++

  • Integer to English Words

    ProblemConvert a non-negative integer to its english words representation. Given input is guaranteed to be less than 231 - 1.SolutionAs we know, numbers in every 3 digits can be represented as thousand, million, billion, trillion, quadrillion … According to description, the number is no larger th...

    » Read More

  • Spark Core Programming

    Spark is a lightning-fast cluster computing technology. Its in-memory cluster computing feature increases the processing speed of an application. Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark, which can be created in two ways: by referencing datasets in external st...

    » Read More

  • Probabilistic Topic Modelling

    Topic modelling is a technique to automatically discover the hidden topics in each document. The basic idea is that words with similar meaning will occur in similar documents. A document is then modeled and described by topics coverages with word distributions. Latent Semantic Indexing (LSI) and ...

    » Read More

  • Discovering Semantic Vocabularies

    Word embeddings are dense vectors used to represent word meanings. Both Stanford’s GloVe and Google’s word2vec are open source packages and provide efficient implementations to train these vectors. Many pretrained vectors can be found online. I download the pretrained word vectors from the GloVe ...

    » Read More

  • Vector Representations of Words

    Distributed RepresentationIn basic NLP tasks, each word is treated as discrete atomic unit. The word vector is filled with 0s and a single 1 meaning word apprearance. It’s easy to see that this one-hot representation is very sparse whose dimensionality is as large as the vocabulary size. It can b...

    » Read More