Data Science 101 by Puneet Garg, Head of Data Science, Carousell

On Friday, I heard Puneet Garg, the new Head of Data Science at Carousell give a very passionate AI 101 talk at the GA Data Science immersive program that I am part of. Puneet had spent 13 years at Microsoft before joining Carousell, and has over a decade of AI experience. He was part of the Bing team, where he worked on cutting edge technologies like Natural Language for Cortana, Speech Recognition and Language Translation for Indian languages, Human like Chat bots (Ruuh) for India etc.

Talking about Data Science, he felt that a Data Scientist's key job is to extract insights and identify predictive models from large volumes of data. And in this job, Machine Learning is a key tool in the Data Scientist's arsenal. He used the following graphic to beautifully explain the difference between AI, Machine Learning and Deep Learning.
He highlighted how the Artificial Intelligence algorithms had been in existence for a long time, but they were infeasible in the past, primarily due to the lack of computational infrastructure. The Machine Learning techniques are well suited for complex input problems involving unstructured data. They can start with simple rules and go on to identify more complex patterns from data. The following image explains this idea of how Deep Learning uses multiple layers to progressively extract higher level features from raw input. In the first layer, the structure of the face is defined, while progressively the algorithm identifies the nuances through the examination of data and eventually becomes very good at this activity.
                                                                                                              Image Source
Later during the discussion, he highlighted how a lot of tools have come up, that has simplified the algorithms themselves and made it easier to deploy and manage these models in production. However, he felt that the critical activity of “Data Preparation” involving data exploration, wrangling and cleaning up is still very manual and that situation would not change any time soon. Elaborating this further he shared how Siri, allowed you to assign names to yourself. The cartoon below elaborates the exceptional cases that would happen when you scale a technology like AI. And those situations will ensure the need for human oversight over AI and related technologies.
                                                                                               Image Source 
He then talked about some non-trivial computational problems like Traffic/ ETA predictions, used by Google Maps and Grab. Recommendation engines from the biggies like Netflix, Amazon. Spam Detection, Web Search etc. In addition to consumer facing entities, he also talked about how AI is being used in the B2B space especially with respect to Trust and even B2D (Business to Developers) where companies are developing solutions for developers like Cognitive Services (Eg Image Recognition tech aimed at the developer community). Anecdotally, he felt that the Singapore Data Science landscape consisted of:
§  Big Tech firms like Amazon, Google, Microsoft, IBM, Twitter, Dell etc. 
§  FinTech efforts of the Banks.
§  Startups like Carousell, Grab, PropertyGuru etc. 
§  Others like the Telecom players and institutions like NUS. 


Coming to Carousell, he felt that the company, represents the 4.0 version of classified ads in an AI first environment. Carousell's vision is to inspire every person in the world to buy and sell, making more possible for one another. Puneet felt that AI had a critical role in implementing this vision and enhance the user experience. He also highlighted the three primary priorities for Carousell:
§  Buying Experience: Puneet felt that factors like geo location and personal preferences could be taken into consideration to make more relevant product listings and user recommendations.
§  Selling Experience: Carousell has made giant strides in improving the seller experience through AI. The effort is directed, at trying to help the seller complete the listing process quickly. This is done through the image recognition technologies which detects the product from the image and recommends the captions, pricing and other meta-data options.  
§  Trust and Safety: Spam detection and detection of Bad actors is also a priority for Carousell.

Explaining the technical challenges further, he used "Canonical", to explain the standardized nature of the products listed on Amazon. Given the strong religious association of the word, I found it amusing that “Canonical” could be used in this context. Unlike Amazon the Carousell's listings have a lot of nuances. A simple iphone could be impacted by how old the device is and many factors around its working conditions. So Puneet's team uses images recognition to help the sellers list the product. Unfortunately, the image recognition is very challenging and in addition to taking into account environmental conditions like ambient lighting and image angles, the algorithms also need to differentiate between the image of an iphone from an iphone cover which are very similar.


Carousell, trains its models on its sizable internal datasets of items for sale and user interactions. When a Sellers uploads an image, a Machine Learning based mechanism, suggests titles and categories for listing. This is more complex than a simple classification problem. Instead Carousel does significant image processing and then uses a ranking model to select the correct title out of a pool of candidate titles from its database of millions of listings. This process of making Title recommendation is done in under 100 milliseconds.
                                                                                                             Image Source
Teaching an algorithm well and deploying it in the production environment is a completely different ball game. Technologies like Tensor Flow and Google cloud and hashing capabilities allow for faster deployments. But even after deployment, these algorithms need to be taught and kept current. That process if not done well, could lead to the algorithms becoming stale over time. He used another beautiful word "Mathemagic" to explain that magic that is possible through the use of maths and data. A very interesting point he then made was regarding the existence of performance upper limits for traditional algorithms. Those upper bounds are changing due advances in Deep Learning technologies that can process large amounts of data, previously not possible.

He then made some beautiful suggestions for people with an interest in Data Science:
§  Always keep the end objective in mind. ML is just another means to that end. 
§  ML will never be perfect. So figure out the evaluation parameters first. 
§  Spend time with Data. If you look hard enough then data will speak to you. 
§  Start simple and don't just throw data at the algorithm. 
§  Have patience and experiment, because when it works it is like magic. 

Answering my personal question on how generalists like myself could enter the field, he felt that though there is a need for broad skills like Communications, Domain Expertise and knowing the Business Strategy, the role for a Business Translator who could interface between the Data Science and Business isn't very large. Hence his advice was to focus on PM and Product Management roles. But when pressed further he advised taking up internship opportunities so that both the candidate and the company could evaluate each other.

Data Science is comparatively a new discipline and sometimes business leaders find it challenging leverage this capability. In addition, Data Science project outcomes are less predictable, more iterative and hence require patience. In such a situation, it would be important for business and the Data Scientists to challenge each other’s their assumptions and arrive at a way to create value for the organization. He also felt that story telling will be a key strength and made a beautiful comment, "Stories that focus on what the audience care about, can drive alignment and additionally if those stories are based on data, it could generate credibility and trust.

Puneet's final thought was that AI is still in its infancy and unlike software engineering where you can practically copy any code from Stackoverflow and GitHub, the AI problems are unique. Hence the commoditization of AI and Deep Learning skills won’t happen anytime soon. 

Comments

Popular posts from this blog

Disappointing IIM Chat with Rahul Gandhi in Singapore

SUSS talk by Setu Chokshi, Head of Data Science at PropertyGuru Group

The Incredible Padma Aunty!!