This page contains links to some of my writings on topics that interest me. Usually, they are inspired by problems that I experience in my day-to-day life.
I like pondering over the act or the process of doing something.
Data Validation with TFDV
In this lecture we will go over the basics of data validation. The first half of this lecture will be a talk on the fundamentals of data validation. We will answer what is data validation?, why should we validate our data? and how we can validate our data?. The second half of the lecture will be a hands-on tutorial on using Tensorflow Data Validation, instructions & code for which can be found on this github repo.
Effortless Parallel Execution with xargs & Friends
Recently, I had to run Tensorflow Data Validation on over 500 public datasets from Kaggle to generate a baseline schema file for further analysis. I chose to do this using the xargs unix command.
Data Smells in Public Datasets
In this talk I will present our recent paper titled Data Smells in Public Datasets which was published at the 1st International Conference on AI Engineering (CAIN) 2022. I will first present the problem we are trying to solve along with the contributions that we made. I will present the methodology which was followed along with the results obtained. I will present a select few smells which I personally find interesting & hope will generate some discussion. Finally, we will conclude the talk with some high level takeaways from our study along with the limitations & future directions of work.
Aru’s Information Management System (AIMS)
AIMS or Aru’s Information Management System is a collection of shellscripts to manage information in plaintext. It is inspired by org-mode, and tries to replicate a subset of its functionalities which I frequently use. AIMS is completely tuned towards my workflow as a researcher and how I manage my digital notes.