Deep learning cracks the code of messenger RNAs and protein-coding potential

24 Jul 2018

Researchers at Oregon State University have used deep learning to decipher which ribonucleic acids have the potential to encode proteins.

The gated recurrent neural network developed in the College of Science and College of Engineering is an important step toward better understanding RNA, one of life's fundamental, essential molecules.

Unlocking the mysteries of RNA means knowing its connections to human health and disease.

Deep learning, a type of machine-learning not based on task-specific algorithms, is a powerful tool for solving the puzzle.

"Deep learning may seem scary to some people, but at the end of the day, it's just crunching numbers," says David Hendrix, the study's lead author. "It's a tool just like calculus or linear algebra, one that we can use to learn biological patterns. The amount of sequencing data we have now is huge, and deep learning is well suited to face the challenges associated with the vast amount of data and to learn new biological rules that characterize the function of these molecules."

RNA is transcribed from DNA, the other nucleic acid — so named because they were first discovered in the cell nuclei of living things — to produce the proteins needed throughout the body.

DNA contains a person's hereditary information, and RNA acts as the messenger that delivers the information's coded instructions to the protein-manufacturing sites within the cells.

Some RNAs are functional molecules transcribed from DNA that aren't translated into proteins. These are known as non-coding RNAs.

Every day, new RNAs are discovered, and gene sequencing technology has advanced to the point that molecular biologists are facing a "torrent" of new transcript annotations to glean information from, Hendrix said.

These vast datasets require new approaches, said the researcher, an assistant professor with joint appointments in biochemistry/biophysics and computer science.

Hendrix and colleagues gave a gated neural network training on both noncoding and messenger RNA sequences, then turned it loose on the data to "learn the defining characteristics of protein-coding transcripts on its own."

It did, with remarkable improvement over existing state-of-the-art methods for predicting protein-coding potential.

"It's really exciting," Hendrix said. "With the competing programs, developers would tell the program what an open reading frame is, what a start codon is, what a stop codon is. We thought it would be better to have a more de novo approach where the neural network can learn independently."

A codon is a sequence of three nucleotides, the basic structural unit of nucleic acids. Codons act like a translator between the nucleotides in DNA and RNA and the 20 amino acids behind protein synthesis.

Compared to other approaches, the model that the OSU team developed, known as mRNN, was better by a statistically significant margin in nearly every available metric.

"It not only found stop codons, it distinguished real stop codons from other trinucleotides that match stop codons and recognized long-range dependencies in the sequences," Hendrix said. "It doesn't wait to see a stop codon - we found it makes its decision long before the stop codon, 200 nucleotides from the start codon. And it learned a subset of codons that were highly predictive of protein-coding potential when observed in a potential open reading frame."

Hendrix and colleagues dubbed these special codons "TICs" - translation-indicating codons.

Deep learning cracks the code of messenger RNAs and protein-coding potential

24 Jul 2018

Latest articles

Starlink Enters Gujarat: Musk-Linked Satellite Internet Deal Targets Remote Connectivity

Adani Energy Secures Japanese Bank Funding for Major North India Transmission Project

Advent, FedEx-Led Group to Acquire Parcel Locker Firm InPost in $9.2 Billion Deal

Taiwan Says Shifting 40% of Chip Capacity to U.S. Is ‘Impossible’

U.S. Treasury’s Bessent Says Fed Likely to Move Slowly on Balance Sheet Decisions

Saudi Arabia Orders 20 High-Speed Trains From Spain’s Talgo

Investors Rotate Into Smaller, Cheaper Stocks as Tech Risk Appetite Fades

Global Chip Sales Expected to Hit $1 Trillion This Year, Industry Group Says

Citi to Match Government Seed Funding for Children’s ‘Trump Accounts’

Featured articles

Server CPU Shortages Grip China as AI Boom Strains Intel and AMD Supply Chains

By Cygnus | 06 Feb 2026

Budget 2026-27 Seeks Fiscal Balance Amid Rupee Volatility and Industrial Stagnation

By Cygnus | 02 Feb 2026

The Thirsty Cloud: Why 2026 Is the Year AI Bottlenecks Shift From Chips to Water

By Axel Miller | 28 Jan 2026

The New Airspace Economy: How Geopolitics Is Rewriting Aviation Costs in 2026

By Axel Miller | 22 Jan 2026

India’s Data Center Arms Race: The Battle for Power, Cooling, and AI Real Estate

By Cygnus | 22 Jan 2026

India’s Oil Balancing Act: Refiners Rebuild Middle East Supply Lines as Russia Flows Disrupt

By Axel Miller | 21 Jan 2026

Arctic Fever: How ‘Greenland Tariff’ Politics Sparked a Global Flight to Safety

By Axel Miller | 20 Jan 2026

The New Oil (Part 5): Friend-Shoring, Supply Chain Fragmentation and the Cost of Resilience

By Cygnus | 19 Jan 2026

The New Oil (Part 4): Can Technology Break the Dependency?

By Cygnus | 16 Jan 2026