MicroRNA-like small RNAs (milRNAs) with length of 21~22 nucleotides are a type of small non-coding RNAs that are firstly found in Neurospora crassa in 2010. Identifying milRNAs of species without genomic information is a difficult problem. In this article, a set of knowledge-based energy features is proposed to identify milRNAs by tactfully incorporating k-mer scheme in bioinformatics and distance-dependent pair potential in statistical physics. Compared with k-mer scheme, features developed here can alleviate the inherent curse of dimensionality in k-scheme once k becomes large. In addition, milRNApredictor built on novel features performs comparably to k-mer scheme, and achieves precision of 75.34%, accuracy of 74.96%, sensitivity of 74.21%, and specificity of 75.72% based on 10-fold cross-validation. Furthermore, for novel miRNA prediction, there exists high overlap of results from milRNApredictor and state-of-the-art mirnovo. However, milRNApredictor is simpler and more flexible to use with reduced requirements of input data and dependencies. Taken together, milRNApredictor can be used to de novo identify milRNAs and other very short small RNAs of non-model organisms, without information of precursor and genome.
Download milRNApredictor standalone program: milRNApredictor.zip
Reference:
Yuangen Yao*, Huiyu Zhang, Haiyou Deng. milRNApredictor: genome-free prediction of fungi milRNAs by incorporating k-mer scheme and distance-dependent pair potential,Genomics. 2020, 112(3): 2233-2240