Normalization, Standardization and Rescaling

First, some definitions Rescaling "Rescaling" a vector means to add or subtract a constant and then multiply or divide by a constant, as you would do to change the units of measurement of the data. For example, to convert a temperature from Celsius to Fahrenheit.

Zinda ho tum !

One of the Bollywood movies which I always loved watching has been ZNMD, here is a collection of shayari recited by Farhan Akthar (Imran) in ZNMD. A compiled version from Souncloud. Apne Hone Par Mujhko Yaqeen Aa Gaya (The poem comes after the trio’s deep-sea dive) Pighle neelam sa behta ye sama, neeli neeli si khamoshiyan, na kahin hai zameen na kahin aasmaan, sarsaraati hui tehniyaan pattiyaan, [Read More]

Converting large csv's to nested data structure using apache spark

What is Apache Spark ? Apache Spark brings fast, in-memory data processing to Hadoop. Elegant and expressive development APIs in Scala, Java, and Python allow data workers to efficiently execute streaming, machine learning or SQL workloads for fast iterative access to datasets. Quick start guide Problem Statement / Task To read lot of really big csv’s (~GBs) from Hadoop HDFS, clean, convert them to nested data structure and update it to MongoDB using Apache Spark. [Read More]

Data science and unix command line

Note : This article applies only to those who code. I have seen many strugling with MS Excel trying to figure out data in a large csv file, I don’t blame them beacause most people I have met ignore standard unix command line tools just because they cared about commandline tools. When the data is BIG(anything above .5GBs) and if we are trying to figure out say even the coloumn names of a csv file MS Excel will get stuck and we will see a MS Windows Not Responding. [Read More]

A dictionary in your terminal

Owning to my poor vocabulary I had to look up for meanings every now and then, the following script gets you the meaning for any word from using bash. A bash script dictionary dict() { #Creating a temp folder dir=~/.dict #Check for the existence if not create one [[ -d $dir ]] || mkdir $dir #download respective file from dictionary dot com # -q => do it quietly ie nothing @ screen # -O save it as mean wget -q -O $dir/mean wget http://dictionary. [Read More]

Useful bashrc functions

I’m going to share some of my bashrc functions which saves me a lot of time. Killer This function helps you find a process using a keyword and to kill it, you don’t have to use ps aux along with grep and then kill the process by entering the pid instead use this function give it a keyword and it will help you in killing a process. killer() { echo "I'm going to kill these process"; ps -ef | grep $1 | grep -v grep echo "Can I ? [Read More]

Configuring to Fedora 17 Local Mirror of NITC

This will help you to configure your system to download packages from NITC fosscell fedora mirror. Login as root in your system and use the following command to create a file inside the folder /etc/yum.repos.d touch /etc/yum.repos.d/fosscellfedora Copy and paste the folowing code into the created file using your favourite editor. ## Nitc fosscell fedora local mirror for fedora 17 and 18 [NITCFedora-updates] name=Fedora $releasever - $basearch - Updates failovermethod=priority baseurl=http://fosscell. [Read More]

Offline mirror a website using wget

Use the following wget command to mirror a website wget -mkpb some-website-url -m mirrors the entire website -k converts all links to suitable web viewing. -p downloads all required files like that of the css, js … -b wget will run in background This method won’t work for many websites as their server will block wget from downloading. I will update this post soon, we can use user agents in wget to mock wget as a browser. [Read More]

Permanently adjust screen brightness in Ubuntu / Linux Mint

This details a fix to reset the screen brightness in Ubuntu / Linux Mint during each boot. This idea will work only if you have a file named brightness in the folder /sys/class/backlight/acpi_video0 and the hack is relevant for the Ubuntu (version < 12.04) where they had a bug in which the brightness is too high during each restart. To know your systems current brightness level. cat /sys/class/backlight/acpi_video0/brightness Change brightness by changing the value [Read More]