Are you interested in using Data Science for Social Benefit?
Data science applications help a number of businesses in achieving business objectives and increasing revenues. So, why not use data science to create social impact?
We train students to apply data science to problems that really matter and work with real government & NGO data to create real change in society.
Sabudh foundation is welcoming aspiring Data Scientists to undergo six months internship and become a SABUDH FELLOW. The foundation works on Data Science projects having real social impact. The interns will be learning machine learning and AI from the leading lights in the industry and academia.
These interns will be working on real-world, high impact problems in areas such as agriculture, governance, healthcare, and education with potential employment offered after the completion of the internship.
- World Class Program in Applied AI.
- 230 Hours Teaching ||| 230 Hours of Training ||| 100 Hours of Immersive Bootcamp 1000 Hours of Private Data Set Immersion ||| Learn and Solve 18 Case Studies.
- Learn how to win and compete on Kaggle – Mentorship form Kaggle Grandmasters.
- By Working on Real Data Sets Learn how to solve key problems, by using your newly found algorithmic armory.
- Build completely working AI products using design thinking with your team and also learn how to be a team player during the journey.
- Gain Work Experience in AI.
- Get hired by Global Market Leading Artificial Intelligence Companies.
- Work under Artificial Intelligence Leaders and a Great Team.
6 – Week program
Sabudh has started 6 weeks program for students in their 2nd year and 3rd year of graduation.
In the 6 weeks training, the following topics will be covered:
- Core Python Concepts
- Oops Lessons
- VCS (Version Control System like Git)
- Restful APIs
- Web Crawling
- Introduction to Machine Learning
And the students will also be working on projects involving Text Analysis, NLP etc.
1. Introduction to Python (20 hours)
Basics of Python and jupyter notebooks; Numpy; Seaborn; Scipy; Pandas; Matplotlib; Flask
2. Stats with R
• Introduction to Statistics – The R Environment
• Functions in R
• Plotting tools in R
• Data cleaning the Dirty Data
• Basic Plotting of Data & Outlier Detection
• Numerical Measures for Quantitative Bivariate Data
• Probability and Probability Distributions
• The Normal Probability Distribution
• Sampling Distributions – Central Limit Theorem
• Large-Sample Tests of Hypotheses
• Applications of t-tests & Chi-Squared tests in R
• One-way analysis of variance & Two-way analysis of variance
• The Wilcoxon rank sum test & The Wilcoxon signed rank
• Kruskal-Wallis test and Friedman test
• Pearson Correlation & Visualizing Correlation in R
• Spearman rank correlation coefficient
3. Maths for Machine Learning
• Sets and types
• Linear algebra and vector spaces
• Functions and function spaces
• Matrices and linear operators
• Matrix differential calculus
• Statistics and inference
• Information and entropy
• Distributions and modeling
• Linear regression
• Gaussian graphical models
4. Spatial Data Science (15 hours)
Spatial Reference Framework, Spatial Data Acquisition, Geo-visualization, Spatial DBMS and Big Data Systems, Spatial Data Analytics
Spatial tools viz GeoDa, QGIS, PYQGIS scripting, R and Python.
5. Data Exploration and Pre-processing (10 hours)
Descriptive Statistics, Basic Plotting of Data, Outlier Detection, Dimensionality Reduction, PCA, MDS, Data Transformation, Missing Values, Normalization, Creating new features, Choosing subset of features
6. Design Thinking (10 hours)
Hands-on sessions on experiential exercises like AI Problem framing experience; AI Storyboarding experience; AI Hypothesis generation experience; RAT : Riskiest Assumption Test; DILO : Day in the life of your customers; Converting user stories into deep AI Product insights, etc
7. Introduction to Machine Learning (20 hours)
What does it mean to learn from data? Motivational Case Studies, Data Science Process, Key Components of Learning, Population vs. Sample, Decision Boundary, Types of data, Typical Issues with Data, Types of Learning,Learning as search –
• Instance and Hypothesis Space
• Introductions to Search Algorithms
• Cost Functions Version Spaces/Perceptron/Linear Regression / Nearest Neighbour, Bayesian Learning -Random Variables, Probability Functions, Canonical Probability Distributions, Joint Probability Distribution, Bayes Rule
• Independence, Bayesian Belief Networks, Naïve Bayes, Logistic Regression Bias Variance Tradeoff, Estimating Accuracy –
• Train/test split, Cross Validation
• Hypothesis testing
• Different measures in classification/regression : Precision, Recall, AUC, MAPE
8. Supervised Learning (30 hours)
Decision Trees, SVM, Neural Networks, Lasso and Ridge Regression, MARS Ensemble Models: Random Forests, XGBoost, Conditional Random Fields, Dimensionality Reduction –
• Matrix Factorization
• Topic Models Time Series Forecasting, Survival Analysis, Text Analysis, Text Classification, Hyperparameter Tuning, Skewed Class Distribution and Cost-based Classification Case Study: Weather Forecasting Case Study: Entity Extraction
9. Unsupervised Learning (30 hours)
Markov Models and Hidden Markov Models, Mining Graphs Clustering: EM, K-means, Hierarchical Clustering, Birch, Spectral Clustering, Self-organizing maps, Association Rules, Deviation Detection, Semi-supervised and Active Learning Case Study: Viral Marketing & Driving for Fuel Efficiency
10. Deep Learning (60 hours)
Convolutional Neural Networks, Recurrent Neural Networks Transfer Learning, Generative Adversarial Neural Networks, Deep Reinforcement Learning, Capsule Networks Autoencoders, More Text Analysis :-
• Word Embedding
• Machine Translation
• Discourse Modelling
• Coreference Resolution
• Question Answering Systems Explaining Deep Models Tensorflow, Keras Case Study: Medical Image Analysis Case Study: ChatBots
11. Cyber Security & Deep Learning (5 hours)
Introduction to AI in information security working on real-world problems covering entire data science and machine learning pipeline in Information Security from data preparation, exploratory data analysis, data visualization, machine learning and deep learning model preparation, and model evaluation, Build a data processing pipeline in Python and iPython notebook to find anomalous network behavior and endpoints,construct production-level machine learning and deep learning models for the Information Security Case Study ;Data ex filtration detection using anomaly detection & Detect Command and Control (C&C) Center using ML and DL models.
12. Machine Learning at Scale (30 hours)
Mining Data Streams, Hadoop and MapReduce, Spark and MLLib
Case Study: Click Stream Analysis
Case Study: Recommender Systems
Dr. Sarabjot Singh Anand, Co-Founder Tatras Data
Dr.Sarabjot Singh Anand is a Data Geek. He has been involved in the field of data mining since the early 1990s and has derived immense pleasure in developing algorithms, applying them to real-world
problems and training a host of data analysts in the capacity of being an academic and data analytics consultant.
Dr. Satnam Singh, Ph.D., Chief Data Scientist, Acalvio Technologies
Dr.Satnam is leading security data science development at Acalvio Technologies. Before Acalvio, he has worked at several MNCs – CA Technologies, Samsung, and General Motors. He has 10+ years of work experience in successfully building data products from concept to production or black-box model predictions using interpretable surrogate models.
Dr. Vikas Agrawal, Senior Principal Data Scientist at Oracle Fusion Analytics
Dr. Vikas Agrawal works as a Senior Principal Data Scientist at Oracle Fusion Analytics. His current interests are in automated discovery, anomaly detection, intelligent context-aware systems, with visual and textual explanations for black-box model predictions using interpret-able surrogate models.
Prof. Bhiksha Raj, Professor, LTI, Carnegie Mellon University
LTI Associate Professor Bhiksha Raj has been named to the 2017 class of IEEE fellows for his “contributions to speech recognition,” according to IEEE. He has had several contributions in the areas of robust speech recognition, audio analysis, and signal enhancement, and has pioneered the area of privacy-preserving speech processing.
Prof. Joao Gama, Laboratory of Artificial Intelligence and Decision Support, University of Porto
Prof. Joao Gama received his Licenciado degree from the Fac. of Engineering of the University of Porto, Portugal. In 2000 he received his Ph.D. degree in Computer Science from the Faculty of Sciences of the same University. He joined the Faculty of Economy where he holds the position of Associate Professor.
Prof. Marko Grobelnik, Chair AI Lab, Jozef Stefan Institute, Digital Champion of Slovenia
Prof. Marko Grobelnik is an expert in the areas of analysis and knowledge discovery in large complex databases. Marko collaborates with major European and US academic institutions and consults industries such as British Telecom, Microsoft Research, Nature, New York Times, Bloomberg, and Accenture. Marko is the author of several books in the area of machine learning, data mining, text mining and semantic technologies and authors of many scientific papers. He is also W3C AC representative for IJS, CEO of the company Quintelligence and co-founder of the company Cycorp Europe.
Mr. Derick Jose, Founding Director, Flutura
Mr. Derick is the co-founder of Flutura Decision Sciences a niche AI &IIoT company focussed on impacting outcomes for the Engineering and Energy Industries. Flutura has been rated by Bloomberg as one of the fastest growing machine intelligence companies and its AI platform Cerebra has been certified to work with Halliburton and Hitachis platforms. Flutura has a lot of paying customers in upstream and downstream areas of Oil and Gas.
Prof. Nasir Rajpoot, University of Warwick
Prof.Nasir Rajpoot is Professor in Computer Science at the University of Warwick. He also holds an Honorary Scientist position at the Department of Pathology, University Hospitals Coventry & Warwickshire NHS Trust.He is a Senior Member of IEEE and member of the ACM, the British Association of Cancer Research (BACR), and the European Association of Cancer Research (EACR).
Prof Ashish Ghosh, ISI Kolkata
Prof. Ashish Ghosh is a Professor & former Head of Machine Intelligence Unit and the Incharge of Center for Soft Computing Research at the Indian Statistical Institute, Calcutta. He is a member of the founding team that established the National Center for Soft Computing Research at the Indian Statistical Institute, Kolkata in 2004 with funding from the Department of Science and Technology (DST), Govt. of India.
Mr. Atul Bansal, CEO, Timesys, USA
Mr. Atul Bansal, CEO, oversees the overall strategic direction of Timesys which provides embedded Linux software, tools, support, services and training to developers worldwide. A pioneer and industry leader in embedded Linux, Timesys’ product and services have been used across many industries Timesys has: developed its fourth-generation embedded Linux development tool.
Dr. Ganesh Mani, Adjunct Faculty Carnegie Mellon University, USA
Dr. Ganesh is a financial services domain expert and an accomplished entrepreneur in the area of embedding high-value IP in various business processes. One of his prior start-ups was acquired by State Street Bank, thereby creating the nucleus of State Street Global’s Advanced Research Center, which was the knowledge-locus for managing institutional portfolios worth several tens of billions of dollars.
Dr. Sukhjit Singh Sehra, Spatial Data Scientist, Elocity Technologies Inc., Canada
He is an expert in the areas of spatial data analysis and knowledge discovery in large complex databases. In particular, his the areas of expertise comprises geographical analysis, text analysis, Machine learning, social network analysis, and data visualization. He worked as Assistant Professor in the Department of Computer Science & Engineering with a Prestigious Institute of North Region, Guru Nanak Dev Engineering College, Ludhiana, Punjab. He is actively involved in the usability and application of technology to solve social issues of Punjab.
Dr. Harpreet Singh, PhD, Founder & Co-CEO, Experfy, Harvard Innovation Launch Lab
Dr. Harpreet Singh works at the intersection of Blockchain, AI and Machine Learning, developing advanced cyrptoeconomic systems, roadmaps, algorithms and data products. In the past he has served as a Chief Analytics Officer and has also led cross-functional teams in global execution of product development, business strategy, operations, and technology functions. He managed the program management initiatives for sixty technology startups from Citigroup’s e-Citi Venture Portfolio Office. Harpreet subsequently established the Project Management Office for FX Alliance, a global foreign exchange platform, where he was responsible for enabling project and risk management functions for New York, London and Tokyo locations. Harpreet earned Master’s and PhD degrees from Harvard University, where he also served as a faculty member.
Mr. Kulmeet Singh, Chief Executive Officer, Founder, Twistle
Kulmeet spent the last decade in healthcare IT strategy, M&A, and product creation. This started when he founded Medremote, a company focused on changing the economics of medical transcription by using the Internet and speech recognition. He has degrees in Economics from the University of Chicago and in Computer Science from Columbia.
Dr. Sawinder Pal Kaur, Data Science Expert, SAP Labs India
She holds a Ph.D. degree in Applied Mathematics from the University of Connecticut and has around nine years of experience in both academics and industry. Currently, she is building a machine learning product from concept to production in the Telecom industry. She is also consulting for the other machine learning products that involve Student performance in an undergraduate program for US Universities, Fault diagnosis in transformers, and recommending the right machines for manufacturing bottles.
Before joining SAP, she was working in Shell India as a data scientist. She has also taught and given training in data science to professionals.
Sukhmeet Singh, Co-Founder at an Analytics and a Green Energy Startup
Sukhmeet Singh has over 15 years of experience in the areas of IT, Analytics, Agriculture and Strategy. Sukhmeet is a cofounder of two Analytics companies: ProdigyNumbers and SportsPanther. Sukhmeet is also engaged with Aston University (UK) and European Bioenergy Research Institute for setting up bio-energy plants in Punjab for converting paddy straw into energy products. In the past, Sukhmeet has worked as Associate Director at the Indian School of Business, working on research projects with Government and Industry in the Areas of IT, Agriculture and SMEs.
Mr. Mandeep Singh Kwatra, Independent Advisor and Consultant, MS Konsulting Ltd.
Mr. Mandeep is a Consulting Partner/ Director with global experience across business and technology having started his career with hands-on software development, elevated to project management and thereafter, management consulting – leading complex transformations and strategic advisory for clients in the Telecom, Education and Services industries. He is an advisor to Executive Dean, Faculty of Business, University of Plymouth; and has a US Patent to his credit in the area of Collaborative Customer Service Model. Mandeep has been advising start-up companies on their product roadmaps; working with industries/practitioners in adopting & applying disruptive technologies to day to day business problems.
Dr. Gurpreet Singh Lehal, Professor, Computer Science Department, Punjabi University, Patiala
Dr. Gurpreet Singh Lehal is a professor in the Computer Science Department, Punjabi University, Patiala and Director of the Advanced Centre for Technical Development of Punjabi Language Literature and Culture. He is noted for his work in the application of computer technology in the use of the Punjabi language both in the Gurmukhi and Shahmukhi script. A post graduate in Mathematics from Panjab University, he did his masters in Computer Science from Thapar Institute of Engineering and Technology and Ph.D. in Computer Science on Gurmukhi Optical Character Recognition (OCR) System from Punjabi University, Patiala.
Mr. Atul Tripathi, Author & Consultant to Govt of India
Mr. Atul Tripathi was a Big Data and Artificial Intelligence consultant in National Security Council Secretariat (Prime Minister’s Office, New Delhi, India) working in field of application of Artificial Intelligence for National Security. Atul is data scientist with 16 years of experience and interest in Artificial Intelligence, Social Media, Multilingual Unstructured Data Processing and Analysis. He has worked extensively on the AI policy and Data Protection Policy too for India. He is member of Leaders Excellence at Harvard Square and GARP (Global Association of Risk Professionals). He has been ranked 6th in top 10 speakers and thinkers who have been influencing thoughts of the nation by SpeakIn (Asia’s largest platform of speakers and thought leaders).
Atul has developed Strategies & Self Learning applications using Artificial Intelligence for – National Security, IoT, IoE, IIoT, Indian Railways, Shipping, Human Speech Recognition, Image Processing. Atul has been teaching Anti Money Laundering, Risk Management, Artificial Intelligence and Policy implementation at various universities and institutions and industries. He is a well-known Keynote Speaker at various forums. He is an author of Machine Learning Cookbook (PACKT Publications). The book has been translated in Chinese. He is an advisor for setting up data science centre at IISER, Mohali.
Madhukar Kumar – Chief Analytics Officer, Shine.com
Madhukar is Chief Analytics Officer at Shine.com & brings ~14 years of experience in artificial intelligence, data sciences, predictive analytics, machine learning, deep learning, text analytics, image/video analytics and consulting across a multitude of both global clients and verticals. He has previously worked with firms like American Express, GE Money, WNS, SG Analytics & Upgrad. He has been conferred “Lifetime distinguished membership” of the Leaders Excellence @ Harvard Square group (an exclusive online leadership community, headquartered at Massachusetts,US) and also a certified Advanced Analytics expert with Experfy, Harvard innovation launch lab, Boston.
Madhukar was featured as one of the CDOs in the news globally by CDO Club New York in Jan 2020. Madhukar was awarded “AI Leader of the year” award in International Business and Academic Excellence Awards (IBAE-2019) in Dubai by GISR Foundation & American college of Dubai. His work on image analytics has been well received in the industry. “The Drone image Analytics” solution was judged as one of the Top 50 applications of AI and was awarded Nasscom AI game changer 2018 award. Also, It won “Stevie Gold award-2018” under “Best product/service of the year” category in London. He was rated as one of the Top 4 Data scientists from India by an Australian Advisory firm(Swami Associates) for significant contribution in science & Technology for Australian firms in 2015.
Madhukar has completed his Masters in Applied Statistics & Informatics from IIT Bombay. He is also interested in Academics and teaches data science, AI & ML and nurtures future data scientists across the globe. He is an advisor for setting up data science center at IISER, Mohali. He is a well known Keynote Speaker & Moderator in AI & Data science conferences. He is also a member of organizing committee of Gavin conferences for AI & Robotics conferences events across the globe.
Madhukar is a passionate athlete and has completed 1 Full Marathon, 5 half marathons, 1 Olympic triathlon (1.5k swimming, 40k cycling & 10k running), 2 Cyclothons (70k & 130k) , Sky Diving & Scuba diving.
Mr. Harsh Dahiya, Founder of Harvesto
Harsh Dahiya is the founder of Harvesto, an impact driven organisation developing technology solutions for farmers and stakeholders in agriculture. Harsh is a computer science engineer and MBA from Cardiff University. He has also worked as a Management Consultant for an impact organisation in the United Kingdom. Harvesto is India’s largest soil testing technology company with its products working in all states of India and over 16 countries worldwide.
Mr. Jaswinder Dhanjal, Founder and CEO of Twinpod Inc.
Seasoned digital health executive with 20+ years of software tech experience with firms in Europe and USA, with last 10+ years in digital health initiatives. Strategic thinker with sound Business and Technology acumen bringing disruption to the Digital Health and Insurance vertical. His Mantra – Keep finding Problems, Keep fixing and innovating.
Founder and CEO of Twinpod Inc, enabling healthcare interactions by connecting Imaging Centers, Physicians and Hospitals. Founder and CTO of DocGiant Inc, Insurtech solutions provider empowering Health plans, TPA and Brokers in making data driven decisions on selection of right cost-containment tools.
Mr. Dev Ganesan, CEO at PathFactory
Mr. Dev Ganesan currently serves as the CEO at Pathfactory. Additionally, he also serves on the Board of Directors of BrandMuscle and volunteers on the Board of DC Central Kitchen. Ganesan received the EY 2017 Entrepreneur-of-the-Year award, Future 50 award in 2014 and 2013, and was recognized as a Washington Tech Titan in 2013.
Dev has a successful track record of building and scaling companies in the digital, mobile, customer relationship management (CRM), and eCommerce industries. His specialties are in the fields of Strategy, M&A, strategic business development, IPO, fundraising, building teams, go-to-market planning, and execution.
Sabudh Foundation is formed by the leading data scientists in the industry in association with the Punjab government with the objective to bring together data and young data scientists to work on focused, collaborative projects for social benefit. We aim to enable the youth to use powerful AI technologies for the greater good of society by working on real-world problems in partnership with nonprofit organizations and government agencies, to tackle data-intensive high impact problems in education, healthcare, public policy, agriculture etc.
HOW IS DATA SCIENCE APPLIED TO SOCIAL GOOD?
Data science can be used across a number of industries in order to be beneficial for the society.
For example in agriculture, there are now Agrobots and drones being used to gauge the health of the harvest that can help farmers improve their crop yield and reduce costs. With the help of advanced technologies, we’re able to save 90% of the spraying costs. These technologies can help states like Punjab which has always been the food basket of India to rehabilitate food security while improving crop health.
Medicine is another verticle where Artificial intelligence has progressed to make the right diagnosis and detect the disease at the right time for it to be cured. Punjab has the highest rate of cancer of India. 18 people succumb to the disease every day, according to a recent report published by the state government. Having AI and machine learning algorithms to diagnose the fatal disease at an early stage can significantly decrease the mortality rate .
“ Data Science has always fascinated me, and Sabudh became the platform that helped me explore Data Science. We had a seminar in our college, Guru Nanak Dev Engineering College, by Dr. Sarabjot Singh, where I came to know about Data Science. After that, I wanted to learn more but didn’t know where to start from. Sabudh is the platform which gave me lots of opportunities to explore and learn. It has been an amazing experience interning here and the methodologies taught by the faculty really helps you make your base solid strong. I made new connections, and the exposure you get here is unparalleled.
As the faculty at Sabudh always says “Once part of Sabudh, always a part of Sabudh”, I look forward to being here at Sabudh in my winter vacations too. “
– Karanjot Vilkhu
Guru Nanak Dev Engg College, Ludhiana
“ Interning at Sabudh was a great opportunity. It gave me a wonderful platform to recognize my talent. The whole team was so helpful to work with. The working environment, teaching techniques and guidance which i got here was marvelous. Everyone had the freedom to learn at their own speed. I really enjoyed working here. Machine learning has become my favorite only because of the way we get introduced to its applications and various aspects.
I attended Machine Learning lectures before this internship but the clarity of concepts and the real value i got at Sabudh was incredible. A big thank you to all the members of Sabudh. “
– Jeewanjyot Kaur
UIET, Panjab University
“ I learnt many things like python Machine Learning Deep Learning.
Other than this we all do group discussions, my fluency in English improved through this process. “
– Amarjeet Singh
“ I absolutely would like to thank Sabudh for this learning opportunity. It was an amazing experience for me professionally and personally. I’m going back with more knowledge on machine learning with their algorithms, but most of all with memories and an unforgettable experience. “
– Kamesh Sharma
Guru kulkangri univercity, Haridwar
“ My experience at Sabudh as an intern was very enriching. Everyone was very helpful. I have learnt so many new things which includes python programming language, machine learning etc. I have also changed the way of looking at any problem. I am very grateful for everything i learnt while interning at Sabudh. I will look forward to get the opportunity to learn more and to work with Sabudh again. “
– Manpreet Kaur
“ Interning at Sabudh was an awesome experience. The time passed at the speed of light, it was very hard to leave that place after two months and I was emotional in my last day at Sabudh. With the course of machine learning, we also learnt a lot about ourselves. It helped a lot in individual development. As a group, we worked on improving our communication skills, which helped a lot in building confidence. And we learn to study without spoon-feeding, which i think is need of the hour. The thing which I liked the most about Sabudh is that the environment was very lovely and homelike. People were like family and very helpful. At last the stay with such amazing people is immemorial and I will like to work again. “
– Nishant Indal
“ As a Data science intern here at Sabudh I’m glad that I got an amazing opportunity to be a part of this amazing organization. Moreover, I’m thankful for this golden opportunity which is a great kick-start for my career in the field of Data science. Under the guidance of Dr. Sarabjot who is a master in the field of Data Science and Machine learning, I got the best exposure and knowledge that I could ever get.
Thank you, Sabudh team, for a super friendly working environment and terrific six months. “
– Linu Jose
IT career academy (GGSIPU)
“ Working here at Sabudh as an intern was a huge opportunity. The exposure and guidance that comes along here serves as a boost in the right direction. I learnt the importance of background details and in depth understanding of the basic concepts that I used to implement. I loved the atmosphere, the freedom and stress free work environment which I found here which helped me work on my own terms and complete the project on time. I look forward to working in close collaboration with Sabudh again. “
– Harinder Singh
Guru Nanak DevEngg College, Ludhiana