2 Systematic prediction and annotation of included genomes
There are many web services for model organisms that are helpful to researchers. These platforms are scarce for non-model organisms, particularly medicinal plants. Though there are more than 100 medicinal plants having their genomes sequenced, only 20 of them are released with gene annotations in NCBI. Even though, some of these gene annotations contain obvious information loss. This makes it difficult to use these reference genomes in a consistent manner and makes it more difficult to identify shared gene IDs for use in comparing data from different studies.
Here we built a Nextflow-based pipeline that integrates ab initio gene prediction, RNA-seq-based gene prediction, and homology-protein-based gene prediction for systematic gene prediction. Then the final gene sets were fed to EggNOG and PFAM for function annotation to generate gene ontology and KEGG annotation for these genes.

Figure 2.1: The basic workflow shows the process of gene annotation.