Model Behaviour

From a corner building in central Bengaluru, a bunch of researchers are trying to crack the knottiest problems in computer science.

Model Behaviour by Vikram Shah; Illustration by Akshaya Zachariah for FiftyTwo.in

Paradigm / Shift: Stories of innovation, shaped by intelligence.

Editor’s note: Paradigm Shift is a multimedia series brought to you in partnership with Microsoft India. Listen to an episode of our podcast about Responsible AI, narrated by Harsha Bhogle.

On an overcast morning this May, I sat in on Microsoft Research India’s Lab Sabha. The Lab Sabha is a weekly all-hands meeting to announce plans for social events, listen to the new interns narrate embarrassing true stories (one involved a sodium experiment gone wrong), ask questions, and talk about the minutiae of lab life.

The centrepiece of the Lab Sabha is a technical presentation by a small research group. That week, Yashoteja Prabhu was presenting his team’s work on ‘Compressing Large Language Models for Efficiency.’ Most of it went above my head—they lost me at “layer pruning”—but not before I got a sense of the scale of the problems that Microsoft Research is working to solve.

‘Compression’ is a preoccupation in machine learning research. As we create more and more bytes of digital content, the data-sets used to train machines are getting progressively larger. This is improving model accuracy, but raising difficult questions about storage and processing. Getting it right is one thing, doing it faster and cheaper is quite another.

That week, the Lab Sabha was well-attended by lab members working on Natural Language Processing, the sub-branch of AI concerned with figuring out how to get computers to understand human speech. The NLP researchers had worked on pioneering projects involving large language models, including one on code-mixing (“Yeh Dil Maange More” is a code-mixed sentence—more on this later). As one of the researchers was sharing a thought with the room, I realised that even the most complex technical research could be stripped down to a neat little question. In this case it was: how can we get computers to translate better?

It’s a question that has caught the official imagination, too. The day before I visited, Infosys co-founder Nandan Nilekani had been to the lab to talk about the opportunities offered by the Union Government’s National Language Translation Mission. NLTM wants to use AI to make texts available in all of India’s 22 official languages. [1]

When the floor was opened for comments at the end, researcher Kalika Bali suggested that Yashoteja connect with Harshita Diddee, a lab member who had been working on compression. Their specific exchange was about “encoders” in “transformer programming,” which I’m afraid your humanities-type correspondent was unable to decode.

But I spent the rest of the day talking to Microsoft researchers about a range of things: the history of the lab, the evolution of India’s technology landscape, cryptography, translation technology, social responsibility, dead ends in research, filter coffee. I came away with a sense that MSR India was a “lab” in the truest sense: bubbling with experiments, ideas and a genuine capacity for wonder.

Doctors in Demand

You saw the spirit of collaboration at the Sabha, right?” lab director Dr. Sriram Rajamani asked me a short while later. We were chatting in a meeting room named Lalitha, after Ayyalasomayajula Lalitha, India’s first woman engineering graduate. [2] “And this happens every week at the Lab Sabha. Because everybody attends, regardless of their expertise.”

MSR India is the India outpost of Microsoft Research, the research subsidiary of one of the world’s best-known technology companies. Headquartered in the Redmond campus, other MSR locations include Cambridge (UK), Beijing and Montreal. MSR India is housed in ‘Vigyan,’ a striking glass building on one corner of Lavelle Road, right opposite the Bangalore Club, a plum location if there was one. But this was not its first home. When the lab was first set up by P. Anandan in 2005, it operated out of a building in Sadashivnagar.

In the 17 years since, Bangalore has become Bengaluru and the technology landscape has undergone a complete overhaul. “When we came here,” Dr. Rajamani said, “the IT industry in India was predominantly a service industry, building software for the West. The biggest change I’ve seen is the realisation that there is value in building technology to help the local market.”

“So my job as lab director is to create an environment for a conversation that is critical of the ideas but supports the people.”

Dr. Sriram Rajamani

Multinational companies have made a concerted effort to move up the value chain. “Look at the development centres that are there now in India: ours of course, Google has one, Amazon has one—all employing PhD type talent. The tech ecosystem has significantly matured in terms of quality and talent, and we have actually grown with that ecosystem.”

MSR India employs about 50 computer science PhDs and operates in four broad areas: Algorithms, Machine Learning and AI, Systems, and Technology for Empowerment. “We will be on par with a really good computer science PhD department in a university,” Dr. Rajamani said.

Two lab members have even won the Shanti Swarup Bhatnagar prize for their work. The prize, named for the “father of research laboratories in India” and awarded annually by the Council of Scientific and Industrial Research, is the country’s most prestigious award in multidisciplinary science. In 2016, Venkat Padmanabhan from MSR India won the engineering services prize for his work on indoor localisation and smartphone-based sensing. In 2019, Manik Varma was awarded for his contributions to extreme classification, resource efficient machine learning (the ‘compression’ problem falls under this broad category) and computer vision.

“Look, scientists are always keeping track of who has discovered what, who has won what prize. Perhaps we are the only multinational lab to have two Bhatnagar winners,” Dr. Rajamani said.

The work of the researchers is supported by around 10 post-doctorates, 80 to 90 long-term interns, 100-120 short-term students and a cohort of around 60 Research Fellows. The RFs, as they are called, are typically undergraduates who have not started their PhD. “They spend a year or two with us, and bring a lot of energy,” Dr. Rajamani said. Some go on to do PhDs and others join start-ups. Illustrious RF alumni include Bhavish Aggarwal, Ola co-founder; Krishna Mehra, director of engineering at Meta; and Udai Singh Pawar, who went on to direct a Netflix India film about start-ups.

The lab also employs a bunch of research engineers. “So, think about what happens in our environment,” Dr. Rajamani excitedly said, “Fifty completely geeky scientists and around 25-30 engineers who actually help bring our work to life. One other way our lab is different from academia is that we don’t just write theorems. We actually implement and get them to work—either on our own, or we work with people like Nandan Nilekani on things like UPI, Aadhaar: population-scale projects.”

Indeed, there are many instances when the hard yards at the lab have informed the company’s work. One example is how the work on code-mixing came in handy when Microsoft developers in Hyderabad were localising Cortana, Microsoft’s voice assistant, for the Indian market.

It Takes a Village

Y

et, the research division of the company functions quite differently from the product side. “With product development,” Dr. Rajamani said, “you decide strategically at the company-level. You say, ‘We are building these products,’ and then everybody works on them. But research is insurance for the future. We are trying to figure out how the tech landscape is going to look five or ten years down the line. That’s the reason we have this bottom-up energy: people come up with ideas.”

“Redmond, China, all of us lab directors meet very regularly. And we talk about what’s going on and get people to collaborate with each other,” Dr. Rajamani told me.

I’d read about several ongoing projects on the MSR India website. Now, I wanted to understand how ideas got greenlit. “There’s a social process, where ideas get presented repeatedly to colleagues,” Dr. Rajamani responded. “First of all, nobody proposes alone. You attended the presentation right? It has four authors. And there are other people listening who have worked on the problem. Harshita, Kalika, they’ve worked on this problem. They will start talking about it now, having lunch together.”

And then, every quarter, there is a lab-wide workshop where the hot ideas are presented. Everybody just stops everything for two days and attends the workshop. “That is where a lot of coalescing happens,” Dr. Rajamani said. “The ideas that get traction are the ones which multiple people are interested in.”

Unlike product groups, the only resource that research groups have is their own time. “And people put down their time if the idea is more valuable than anything else they are working on,” Dr. Rajamani said. “That’s the culture of the lab. It’s not for people to say ‘This is a good idea, you should work on it.’ It’s to say ‘This is a good idea, I am going to contribute to it.’”

After this, there is a phase of peer-review and evaluation. The researchers write papers which are presented both internally to Microsoft leadership and externally at industry conferences. The ideas that have legs organically bubble up. People simply stop working on ideas that may not make it through this rigorous pipeline. “So my job as lab director,” Dr. Rajamani said, “is to create an environment for a conversation that is critical of the ideas but supports the people.”

Dr. Rajamani’s background is in programming languages. So, apart from being lab director at MSR India, he often has two or three ongoing technical projects. Currently, he is focussed on the question of how AI can help improve productivity in programming. “Look, AI has disrupted everything,” he told me. “Medicine, chemistry, physics. It actually turns out that it is disrupting programming itself. Like a snake biting its own tail! Five years from now, will AI agents write all our code? I think about things like that.”

Like the work on compressing large language models, Dr. Rajamani’s technical work comes under the “geo-blind” category of research that MSR India does. “There was nothing specific about India in the talk, right? We could do this work anywhere in the world. But 10 years ago, this kind of work could never have happened in India. We now have enough intellectual ability to do it.”

The other broad category of MSR India’s work is inspired by India as a context. “We are working with Nilekani on NLTM, we are giving digital employment opportunities to underprivileged people in rural areas, there’s also a piece to help people get credit,” Dr. Rajamani said. “The lab gets its character from these two groups of people working together—the completely high-tech people like at the talk you saw today, and those who are working on technology for development and inclusion.”

Secret Sharing

S

peaking of “completely hi-tech people,” my next meeting was with Divya Gupta, a researcher who works in a sub-field of cryptography called multi-party computation. The goal of MPC is preserving privacy: can two or more parties jointly compute an output while keeping their inputs a secret? The idea is to do away with the need for a trusted third party who necessarily has to own and hold the data to arrive at conclusions.

One of the ways to understand MPC is through the Millionaires’ Problem, first introduced by computer scientist Andrew Yao in 1982. If two people are interested in knowing which of them is richer without revealing their actual wealth to each other or a third party, how can that be done?

It sounds like magic to me, I told Gupta. “This is the magic I was attracted to in the first place,” she said. As an undergraduate student at IIT-Delhi, Gupta was interested in theoretical computer science. “And cryptography is a part of it. It is mathematically deep.”

A chain of events led her to do a PhD at the University of California Los Angeles. “I went to UCLA to do a PhD in algorithms. But UCLA is one of the top schools in cryptography. In a reading class in my first year, I happened to read a paper on classical protocols in secure multi-party computation. What are you even talking about, I thought! As I read more, I realised cryptography is filled with many such magical objects.”

“Cryptography is very deep and subtle. Knowledge transfer is hard and building a usable system is easier said than done.”

Divya Gupta

At UCLA, the focus of Gupta’s research shifted from algorithms to theoretical cryptography. After a short six-month stint at Berkeley post the PhD, Gupta joined MSR India. “Here, there are a large number of people who talk cross-area,” she said, “I met systems folks, programming languages folks. They started raising questions like ‘how efficient can cryptographic solutions be’? Till then, I’d only thought about what can be done, what is feasible. These people were asking ‘what can we do today?’”

Gupta also found herself having to think through questions about democratisation. “Cryptography is very deep and subtle,” she said. “Knowledge transfer is hard and building a system which is usable is easier said than done. How fast can we carry out a secure inference on real-world machine learning models?” These conversations in the lab inspired Gupta to move somewhat away from theoretical cryptography and towards applied aspects.

The potential applications of MPC sound promising. Gupta gave me an example of how it might be deployed in anti-money laundering investigations. Say, the Enforcement Directorate is investigating a white-collar crime and wants to identify fraudulent transactions that have been routed through several banks. Today, this sort of investigation often means that banks also have to give up data belonging to customers who aren’t under scrutiny. The MPC protocol could theoretically keep that data under wraps.

“MPC is about finding patterns across databases,” Gupta said. “And this actually raises another question I am working on: making MPC frameworks which are user-friendly for non-cryptography experts.”

PPML or Privacy Preserving Machine Learning is a research area that is seeing a lot of activity. That’s because, in commercial contexts, there is more than one party that wants to protect its privacy. “Think about it,” Gupta said, “The machine learning model is also proprietary, it has been trained on sensitive data with a lot of hard work. And clients, of course, could have medical and financial reports. They’d want to learn results without sharing data with the owner of an ML model.”

Even after Gupta’s painstaking efforts to explain how MPC could work (if you’re really interested, start with looking up “Shamir’s secret sharing”), the spell remained unbroken. This is because the concept of MPC is, at its core, based on fairly advanced mathematical operations.

“You know, this is one of the reasons that commercial deployment of MPC is blocked,” Gupta said, while I looked clueless. “This is a chicken-and-egg problem. As a client, how will you demand this level of security when you’re not even contemplating it? Also, who is asking for this strong privacy notion at this point?”

Another obstacle to the commercial application of MPC protocol is the cost factor. “So when you talk about simple computations, addition and multiplication operations cost the same,” Gupta explained. She was referring to the binary arithmetic that computers perform—with 0s and 1s. “The MPC cost profile is completely different. Additions are free but multiplication costs a lot. So that’s why cryptographers will rewrite a function to have a lower number of multiplications and blow up the number of additions. And the systems guys are like, ‘What did you do!’”(I was somewhat relieved to learn that even seasoned programmers were having trouble understanding cryptographic functions.)

In addition to developers, Gupta has a friendly beef with another category of people: cryptocurrency folks. “Today, when I tell someone I work in crypto, everyone lights up and asks me which coins to invest in.” ‘Crypto’ is also the name of the field’s biggest global conference, the Woodstock for cryptographers. “It’s been called Crypto for a long time, 40 years or so. Before cryptocurrencies, if you searched on Bing, the top result was the conference web page. Now, it’s a task to find it.” [3]

Mined Your Language

N

ow that I’d had a taste of the “geo-blind” work happening in the lab, I wanted to speak to someone who’d worked on India-specific problems. It’s why I requested a meeting with Kalika Bali, who has worked extensively on translation technology for low-resource Indian languages.

Her work has taken her outside the lab, bringing her face to face with social and cultural contexts of technology use in India. “See, with low-resource language communities, it’s not like you give them flashy stuff and they go ‘Wow!’” Bali said. “It’s nothing like a science fiction scenario. It makes you realise that there are some very straightforward things we can do to help.”

To illustrate her point, Bali narrated a field anecdote from Naya Raipur, Chhattisgarh. MSR India had collaborated with non-profit publisher Pratham Books in 2018, to facilitate a tool that translated books into Gondi, a language spoken by about 30 lakh people.

“We were showing the people there how to use the framework we had created, to translate from Hindi into Gondi,” Bali said. “And on the very first day, they put together two books in Gondi. It was the first time that those children’s books had been translated into Gondi.” In two weeks, almost 200 books were translated.

“If you’re creating knowledge, papers will happen. Papers are just artefacts of the research process.”

Kalika Bali

For the last couple of years, MSR has collaborated with Pratham Books to develop the Interactive Neural Machine Translation tool. “It came from the Pratham project, and we’ve put it out in the open source now,” Bali said. The beauty of INMT is that it enables predictive text at cross-language level. “So you’re translating from, say, English to Hindi, it will give you suggestions. This is very useful because a lot of amateur translators are not used to the work. As we’re moving towards crowdsourced translations as a method of data procurement, these kinds of tools really help.”

Now, there is even a lighter version of INMT, which can be used on the phone. “The only thing is you need a machine translation model at the back, so we just put out a Hindi-English one for testing,” Bali said. “Now, we’re getting ready to go to the field to test a Hindi-Gondi model.”

On the day we met at Vigyan, Bali had just finished meeting with a group working on gender and accessibility on social media. This piece was part of a wider project that is hoping to land on a long-term view of how intersectionality works in technology.

The origins of this project lay in a study that Bali and a team had conducted last year. The study was centred around women gig workers on the Karya platform. Project Karya was launched by MSR India in 2017. The idea was to crowdsource language data from rural areas to build large language models. Participants were paid for completing tasks like digitising documents written in local languages and recording themselves reading out a sentence.

“What we found on the field was very interesting but not unexpected if you know India,” Bali said. “Women do a lot of work at home. So just assuming that they will sit in one place and do the Karya work is a misunderstanding. A lot of the women we spoke to were working late at night. Many women had a hard time trying to convince male authority figures about the work.”

Arriving at ways to navigate trust-building in this social structure is an important exercise for a technologist. “You have to know how users can and cannot use the technology,” Bali said. “That varies at intersections that we’ve only just started thinking about as technology builders.”

At an institutional level, what is the endgame of something like this intersectionality project, I asked Bali. Is it producing published work? “Publication cannot be the goal of research, that’s short-sighted,” Bali said. “The goal of research is to push the envelope on understanding. That means creating and enhancing knowledge. And if you’re creating knowledge, papers will happen. Papers are just artefacts of the process.”

Research Rollercoaster

W

ith Monojit Choudhury, Bali’s former comrade on several language-related projects, I spoke about endings and evolution. Choudhury is now a principal data and applied scientist at Turing India. Project Turing, named for the British computer scientist Alan Turing, is charged with building large language models that form the backbone of Microsoft’s products. The Turing India team also operates out of Vigyan.

Before this, Choudhury was a principal researcher at MSR India. One of the projects that he had worked on with Bali was the code-mixing one. Put simply, code-mixing refers to the mixing of two or more languages in speech.

“Code-mixed language has its own logic, and it is based on linguistic theories,” Choudhury explained. ‘I am kal going’ for ‘I am going tomorrow’ is not valid, for instance. ‘I am going kal,’ on the other hand, is valid. “We tried to build an algorithm which will take a Hindi sentence and its English translation and generate valid code-mixed sentences.”

Project Mélange kicked off in 2014. The aim was to understand the uses of code-mixing and build tools around it. It was considered worth pursuing then because a large corpus of code-mixed data wasn’t freely available on the internet.

“But if you convert monolingual data to code-mixed data, you get a lot of data,” Choudhury explained. “We threw in interesting machine learning algorithms on top of this artificially generated data and got very good performance on tasks on code-mixed texts.”

By “tasks,” Choudhury meant things like translations and sentiment analysis. Up to 2017, this was state-of-the-art technology. In 2018, a large language model called BERT was built by a team affiliated to Google. [4] “You can imagine this as a model which has at least some knowledge of about 100 languages: words, phrases, etc,” Choudhury said. “This model was powerful. It also mapped between the languages: say, this particular phrase in English is equivalent to this in Hindi.”

When Choudhury’s team tested BERT, they found that it was good at code-mixed translations even though it hadn’t been specifically trained with code-mixed data. “So a large part of our effort to create code-mixed data artificially was lost,” Choudhury said. “All the linguistic theory we used to generate code-mixed data, which we were so proud of, suddenly became obsolete. It’s still beautiful but not useful.”

This kind of research setback is very common, given the shifting landscape of technology. Another kind of setback involves reaching the playground too early. Choudhury graciously gave me an example of this.

A few years ago, MSR India kicked off a project called Artificial Social Intelligence (ASI). By this time, it was clear that technology had gotten very good at processing both structure and meaning in languages. The place where it stumbled was ‘pragmatics.’ Pragmatics can be understood as a context-dependent style or tone. For instance, I could say the same thing to my parents, boss and friends, but the way I say it could be different in all three contexts.

“There are two things here: how to define the style and how to define the social context in which the style is appropriate,” Choudhury explained. A chatbot building project, for instance, could benefit from pragmatics. One could think about the kind of persona one wants to give the bot. Pragmatics is an evolving concept in human relations, too. The tone taken in a tenth meeting with someone could vary drastically from the first.

“When you talk about impact, and this is probably something that has happened across all fields, muscle power becomes important.”

Monojit Choudhury

The researchers worked on the project for a year-and-a-half before they pulled the plug. “We had to stop the project for two reasons,” Choudhury said. “It is too vast to enumerate all the social dimensions in which style factors in. We were not in a position to go and collect data for all of those. How do you model things like geography and cultural nuances?”

The second reason was that the model was unable to solve basic semantic problems. “If you asked a question like ‘How many eyes does a leaf have?’, the model would have come up with an answer like one or two, because such a pattern has never appeared in the data.” Even the GPT-3 model, which was introduced in May 2020 and has been described as “one of the most important AI systems ever produced,” may not be able to answer such a question requiring common-sense.

“This tone thing is a hard problem for humans too,” Choudhury said. “Most human misunderstandings crop up because of something that has been said. In some sense, we were ahead of our time. Maybe five years down the line, when the common-sense problem has been solved, people could get more interested in ASI.”

C

houdhury joined MSR India in 2007. For a parting shot, I asked him how the philosophy of the lab itself has changed over the last 15 years.

“It was a very small group of people in Sadashivnagar,” Choudhury said. “Everybody knew everybody’s family, it was a very cosy place. The philosophy then was more about individual brilliance. Put the best minds in their field together and great things will happen.”

In the early days, every researcher was expected to be a thought leader in their field. But now the focus has shifted to impact. The people at MSR India talk about three axes for impact: on society, on science, on industry.

“When you talk about impact,” Choudhury said, “and this is probably something that has happened across all fields, muscle power becomes important. If you look at medical and biology papers in journals like Nature these days, the impactful ones have hundreds of authors. It’s no longer one C.V. Raman who is writing paper after paper and revolutionising spectroscopy.”

And so it is in computer science. Change is constant, and the researchers at MSR India are strapped in for the ride. Still, one thing is unlikely to change in the corridors of Vigyan: its talented people will continue to personify machine learning models. “Beautiful,” they will say with glinting eyes, as they describe an algorithm.

Vikram Shah is associate editor, FiftyTwo.