About Me

I am a Masters in Computer Science Student at Arizona State University, I intend to enhance my knowledge in the area of Big Data, DevOps, Artificial Intelligence, and Machine Learning. I would love to collaborate with people doing projects in these areas.
Feel free to take a look at my projects, resume or contact me below!

Contact Details

Manav Bagai
mbagai@asu.edu

Education

Tempe, AZ, USA

Masters in Computer Science August 2018 - Present

Aligarh, India

B.Tech in Computer Engineering July 2012- June 2016

GPA: 8.10/10

Work

Pune, India

Big Data Engineer Nov 2016 - Feb 2018

  • Engaged in Architecture and Commercialisation of complex Data Engineering solutions over Cloud.
  • Worked on domains like Retail and Healthcare and technologies like Hadoop, Spark, Docker, AWS, Python, Java, Scala, Apache Nutch, Neo4J, Play Framework and Salt-Stack.
  • Worked on two projects namely "Recommendation System and Chat-bot" in a four-member team and "Ingestion, Data Analysis, and Visualization of Cardiovascular Patient Data" in a three-member team.
  • Actively contributed to the company products like XStream and XInterview.
  • Internships

    New Delhi, India

    Deep Learning Trainee March 2018 - July 2018

    Gained Knowledge related to Machine Learning and Deep Learning with libraries and frameworks like Tensorflow and Keras.

    Data Science Internship in Healthcare March 2018 - July 2018

    Worked on a complete market research over various methodologies of Automatic Epilepsy Detection and Prediction and written a research paper over it under the guidance of Dr. Sukant Khurana.

    Work From Home

    Web Developer - Django March 2018 - April 2018

    Worked on developing a Social Discovery Platform in Django Framework.

    Skills

    Python, Java, C, and familiar with Scala and PHP

    Docker, SaltStack, AWS- EC2 and VPC, Jenkins

    Play Framework, Django

    Hadoop, Spark, Pandas, Numpy

    SQL, Neo4J, Hive

    Tensorflow

    NLP

    Stanford CoreNLP, WordNet

    Apache Nutch

    Apache Airflow

    OS

    Linux-Worked on Debian based(Ubuntu) and RedHat based(CentOs and RedHat)

    Projects (Professional)


    Ingestion, Data Analysis and Visualization of Cardiovascular Patient Data

  • Related to Healthcare domain in a 3-member team.
  • Worked on creating Application Docker with Java, Python, Scala, Hadoop, Spark, Airflow, and Zeppelin.
  • Responsible for creating and deploying Database Docker with Druid and Superset installed.
  • Contributed towards designing and developing ETLs in Spark and Scala.
  • Contributed towards optimizing the data algorithm to work for large data and in distributed mode in PySpark.
  • Responsible for creating and maintaining client environment which includes AWS, Dockers, Jenkins and SaltStack.
  • Responsible for orchestrating the whole flow using Apache Airflow.
  • Developed a micro service in Play Framework which acts as middleware between UI and Airflow.
  • Platform and Technologies​- Python, Scala, Play Framework, Apache Hadoop, Apache Spark, Docker, R, Apache Airflow, Druid, Apache Superset, MySQL, Saltstack, Jenkins, AWS, SBT, Front End Technologies

    Recommendation System and Chatbot

  • Related to Retail domain in a 4-member team.
  • Worked on creating the knowledge base by crawling web data using Apache Nutch. This involved writing an HtmlParseFilter plugin in which the useful data was extracted with the help of regular expressions.
  • Written Python and Pandas scripts to clean the data and generate required csv and Jsons out of it.
  • Written REST service in Play Framework to ingest data in Neo4J and query elastic search.
  • Platform and Technologies- ​Java, Python, Neo4J, Nutch 1.12, Play Framework, Elasticsearch, API.AI, AWS, ANT, SBT, Maven, Front End Technologies.

    Contribution to Exadatum Products

  • Dockerized the company product XStream in a docker with Hadoop, Java, Scala, Python, Spark, Zookeeper, Kafka, SQL, Druid, Superset, Hive, Maven and Redis installed.
  • Architected and developed the automatic AWS instance management system using Python and Boto3 for Company product XInterview.
  • Given company level training sessions on Docker and Pycharm Remote Debugging.
  • Projects (Academic)


    Query-Focused Multi-Document Summarization

  • This project involves taking documents and query as input and generating summary related to the query as an output.
  • NLP Tasks: Pre-processing both document and query by performing stemming and stop word removal using Stanford CoreNLP library.
  • The documents were broken into sentences and query was expanded by generating synset using WordNet.
  • Represented sentences and query as vectors using term frequency and calculated cosine similarity to select similar sentences.
  • Platform and Technologies​- Java, Stanford CoreNLP, WordNet, Rida-WordNet, Lucene, Maven.

    Text Based Plagiarism Detection

    Developed an efficient technique to detect plagiarism in text-based documents. This project introduced a mechanism that combined the functionality of substring matching and keyword similarity to provide more efficient results. When there was a huge number of documents to which plagiarized document is to be compared, it would require a lot of time. So a cluster of documents was created to make this task less time consuming that contains only those documents having a high F-Score. The F-Score was calculated using the lengths of both plagiarized document and the reference document and their longest common subsequence. A research paper was published in an international journal based on this project in April 2016.

    Platform and Technologies- ​​Python.

    SpeedAhead.com

    Designed SpeedAhead.com - a dynamic website that displays different cars and their catalog. This website aided in connecting the user to the car sellers and book the cars directly online. Some of the functionalities included are adding and removing users/sellers/cars, view catalog, users, admin and seller login.

    Platform and Technologies-PHP, HTML, CSS, Bootstrap, JavaScript, MySQL, Wampp Server

    8085 Microprocessor Simulator

    The project was developed in C language. This project involved simulating registers such as general purpose registers, Accumulator, and Program Counter, main memory and also simulating various instructions related to data transfer, arithmetic and logical operations, and branching.

    Get In Touch.

    Contact me!

    Your message was sent, thank you!