Researchers Read the Sugary ‘Language’ on Cell Surfaces | Quanta Magazine

when pascal gagneux envisions malaria parasites nother pathogens interacting w'da surfaces offa host’s cells, he pictures a miniature rainforest with pathogenic pessentialisms flying overhead like colorful birds. the canopy consists of branching sugary molecules that adorn the surface of the cell. “if you’re a malaria parasite, you’re landing na' human red blood cell,” and “the 1st ‘cutouts’ that you touch” are sugars called sialic acids, said gagneux, an evolutionary biologist atta university of california, san diego.

his ecological view odat interaction is √ed onnis previous field work studying wild chimpanzee behavior inna dense west african forests. during those treks, he began to ask himself: “why is it that humans and chimpanzees, who share so much of the same dna, don’t deal with diseases the same way?”

“diseases that give you and me the sniffles will actually kill chimpanzees,” he explained. but'a opposite is also true. chimps aint susceptible to influenza a, and hiv infections turn lethal in humans but stay mild in chimps. the malaria parasites that kill humans can’t infect chimps, and vice versa. this odd selectivity aint peculiar to primates — there are countless exs of pathogens devastating certain host species but not others.

seeking an answer, gagneux pivoted to the study of the glycomolecules, or glycans, in that “rainforest canopy” that shrouds cells. glycans are a spectacularly diverse group of complex sugars (polysaccharides). they can exist on their own — cellulose is a plant glycan made up of long chns of glucose — or they can be anchored to other biomolecules like proteins and lipids, whose chemical properties they modify. their structure can be linear (as in cellulose), but they can also be very highly branched, adding to their variety and complexity.

their endless variation among cells and species is central to why pathogens devastate certain host species but not others. it helps to explain the “spil♥r” of certain infectious agents, like sars-cov-2, from one species to another, leading to global pandemics. but it’s also a key to cellular behaviors even within species, s'as the interactions tween human sperm na egg and uterine cells.

now scis maybe verging na' breakthrough inna cogging of glycans and glycobiology. after analyzing a comprehensive data set of glycan structures and their known interactions, researchers at harvard university na massachusetts institute of tek found a shared structural “language” that all organisms use when making glycans, like a municipal building code that ensures consistent, compatible architecture. the researchers ‘ve released a set of online tulz that any-1 can use to analyze glycan structures and functions.

abundant but mysterious

the shift in gagneux’s interests happened when he met ajit varki, now a physician-sci and co-director of ucsd’s glycobiology research and training center. gagneux said that varki, who became his mentor, had “just stumbled across the 1st biochemical difference tween humans and chimpanzees.” varki and his team had found that, + than 2 million yrs ago, a mutation in humans’ ancestors inactivated a gene that modifies sialic acids in all other primates and most other mammals. as a result, hundreds of millions of sialic acid glycans tha're present in other primate cells are missing from human ones.

to varki, glycans are still 1-odda gr8est enigmas of the biological universe.” they’re “actually so prominent, they’re a major component of biomass onna planet.” in fact, glycans make up most of the organic matter by mass: cellulose and chitin, the major building material of arthropod exoskeletons and fungal cell walls, are nature’s two most abundant organic polymers. and yet in contrast w'da overabundance of glycans, “this whole field s'been left behind,” varki said.

daniel bojar, a bioinformatics researcher atta university of gothenburg na wallenberg center for molecular and translational med in sweden, agrees that our knowledge of glycans pales in comparison to wha’ we know bout the other major biopolymers: dna, rna and proteins. glycans, he explained in an email, “are a mysterious, omnipresent entity in biology that we either conveniently ignore or struggle to make sense of.”

according to varki, the current state of glycobiology harks back to the l8 20th century, when major changes were happening in biology. glycans were heavily researched through the 1970s na 1st ½ of the 1980s. “glycans were very prominent, with one nobel prize every decade. there were very prominent pplz in many fields studying glycans,” he said.

b'tas varki wrote in a 2017 review, “the field of glycoscis originated in ‘descriptive’ carbohydrate chemistry and biochemistry and remained in these domains for a long time,” instead of probing harder ?s bout the synthesis and functions of glycans.

meanwhile, major teknical advances were accelerating the study of nucleic acids and peptides, long linear molecules directly specified by genetic code templ8s. in contrast, the complex branching structures of glycans arise through a series of chemical reactions that add and modify sugar residues. there was no corresponding improvement in resrcs for studying them.

as a result, by the mid-1980s, “dna, rna and proteins, all the molecular biology, came and took off and left the glycans behind atta station,” varki said. that development was dis♥ening for varki, who was looking for his 1st indie research position round that time. but despite the challenges, he told himself, “i’m goin to stick with studying these things,” even when many other researchers were giving up on'em.

gagneux said that “quite a lotta molecular biologists are borderline annoyed by glycans,” which are tiny and translucent. “you can 1-ly see them if you start throwing things at them that stick to them,” s'as lectins, which are proteins that can tag short, distinctive saccharide sequences. yet neglecting to study these crit components ‘d mean missing game-changing information bout some of humankind’s biggest challenges and ?s.

richard cummings, a professor of surgery at beth israel deaconess med center and harvard med school, describes his “life’s work” as focused on “cogging the structure of complex carbohydrates, these glycomolecules [and] how they’re made.” glycomolecules, he said, are “the most complex structures that the human body makes.”

cummings is a co-director of the realmwide human glycome project. he nother researchers on that project, which was 1-ly launched in 2018, aim to “sequence and identify all odda glycans and carbohydrate structures — glycomolecules — in humans,” he noted. in contrast, the human genome project launched in 1990 and formally ∴ in 2003, illustrating just how big the gap has grown tween knowledge of the human genome na glycome.

yet tis crit that researchers determine which roles specific glycans play in illness and disease iffey hope to develop + effective strategies for preventing and treating these conditions.

molecular windows into disease

some odat research is already proving fruitful. huge strides ‘ve been made inna study offa growing group of rare genetic metabolic disorders stemming from defects in glycosylation, according to varki. “after a slo start inna early 1990s an international effort of many investigators has now resulted in a veritable explosion in discoveries of human genetic disorders of glycosylation,” he wrote onnis 2017 review article.

researchers ‘ve already turned to glycomolecules to gain new insites bout conditions as diverse as cystic fibrosis, cancers, sickle cell anemia, hiv and covid-19. for instance, in 2020, cummings and his colleagues published a molecular ψ-chiatry review article covering 25 yrs of post-mortem brain studies on abnormal glycosylation in pplz with schizophrenia.

cummings, who also directs the national center for functional glycomics na harvard med school center for glycosci, studies the function of glycomolecules in human biology and how mutations or alterations in those functions can cause pathologies. he also investigates how bacteria, parasitic worms and viruses s'as influenza infect and sicken humans.

“it turns out in almost any of these cases, tis through interactions of glycomolecules that microorganisms and parasites cause human disease,” cummings said. linking that knowledge to new treatments or preventive measures often remains a grand challenge.

decoding the language of glycans

one hurdle for glycobiology, gagneux noted, s'dat even closely rel8d species with high lvls of genetic similarity, like chimps and humans, ‘ve glycans that can vary significantly cause of constant, ongoin coevolution. each species faces its own evolutionary pressures from diseases that cutout a mark on its library of glycans: the host glycome evolves to evade or counter pathogens’ attacks, na pathogens’ glycomes evolve to escape the immune defenses o'their potential hosts.

“it gives rise to this molecular arms race that happens ≠ly once ye go separate evolutionary ways,” gagneux said. for instance, even if you inject humans with chimp malaria parasites, they don’t get sick. (“believe it or not, this was done [in belgium] inna ’50’s,” he said.) that’s ptly cause the chimp malaria parasites can’t find the blend of sialic acids they seek on human red blood cells.

onna other hand, chimps are highly resistant to cholera cause the vibrio bacterium that causes the disease makes a toxin that targets 1-ly the sialic acids onna cells lining the human gut, punching holes through their membranes. cause of host-pathogen coevolution, “there’s a lotta diversity” inna glycome, cummings said.

that diversity was apparent when scis at mit na wyss institute for biologically inspired engineering at harvard used glycan-focused machine learning models to analyze a data set of + than 19,000 unique glycans. this included “6,969 eukaryotic, 6,119 prokaryotic, and 152 viral glycans,” they wrote in their 2020 cell host & microbe study.

“cause we included all species for which we ‘d find glycans, this dataset constituted a comprehensive snapshot of currently known species-specific glycans,” the researchers wrote.

bojar, who was a postdral fello atta wyss institute and mit atta time, tis study’s 1st author. he and his colleagues envisaged 1,027 unique simple sugars (monosaccharides) and chemical bonds inna glycan sequences. they treated these as “glycoletters” — “the lilest units of an α-bet for a glycan language,” they wrote. they then began looking through the data set for patterns of “glycowords,” defined as sequences 5 glycoletters long (that is, 3 monosaccharides linked by two bonds).

to that end, they trained a bidirectional recurrent neural network on sequences from their database and used it to create a model for a glycoletter-based language. such neural networks are comm1-ly used to learn and train language models. “you can kind of think bout it as reading a sequence of text forward and then reading it backward,” said rani powers, a senior staff sci atta wyss institute and a researcher onna study. “you wanna keep the context of wha’ is primordially the sentence in this case, rather than just pulling out all odda words or all odda letters out of context.”

in theory, the glycoletters inna data set ‘d ‘ve formed nearly 1.2 trillion ≠ glycowords. yet, surprisingly, the researchers’ results indicated that 1-ly 19,866 distinct glycowords were present across all the available sequences. notwithstanding the immense complexity and diversity of glycans, na differences in glycans tha're toonistic of various species, the evidence suggested that all organisms follo very similar rules in assembling them and use primordially the same biomolecular language to define their structure.

the researchers discovered that by fine-tuning their models, they ‘d predict with high accuracy the taxonomic groups of the organisms from which glycans came. further+, they were able to train the models to predict with bout 92% accuracy whether glycan sequences in a reference data set were immunogenic to humans.

the results are “very exciting,” na further application of sophisticated computational tulz to cogging glycans ‘d turn out to be “primordial and revelatory,” said lara mahal, a glycomics researcher atta university of alberta who was not involved w'da study. (she is working na' ≠ project with bojar.) “it helps reduce the complexity of glycans into clear patterns from which we can gather primordial information, for ex onna pathogenicity of glycans,” she added.

the wyss and mit researchers hope that other teams will use the tulz for glycomic design and analysis t'they ‘ve developed and posted free online. according to bojar, their most immediately useful application maybe inna pharmaceutical industry, for glycoengineering therapeutic monoclonal antibodies. antibody proteins latch onto specific antigen targets on pathogens. but tis the glycans linked to the proteins that determine how the antibodies interact w'da rest of d'body’s defenses and help to direct wha’ kind of immune response follos. inna future, bojar said, the tulz mite be able to suggest glycans that ‘d improve the performance of antibodies, for ex by limiting their side effects or + precisely calibrating their ½-life in d'body.

mahal noted that she is already using the tulz to learn + bout the specificity of the assays used to identify the glycans on cells. “these new computational teks combined with high-throughput analysis will revolutionize our cogging of the glycome and its role in disease,” she said.

original content at: www.quantamagazine.org…
authors: rachel crowell

Share: