A weighted average using the confidence interval of each predictor would be probably a better approach. A very naïve approach would consist in simply averaging the values. How should one combine the various partial predictions and derive the statistical characteristics of the final outcome? This is a wide area of study, as there are many different special situations (contexts). Suppose that a chemical shift has been estimated by several ‘predictors’ and that each estimator can characterize in some way the ‘reliability’ of its own estimate (i.e. classifiers) is quite popular and it is known as Ensemble Learning ( ). This could help not only to get more accurate predictions but also to reduce the number of prediction outliers where the predicted values were exceptionally poor for a particular individual predictor.Īctually, in the field of Machine Learning, the concept of combining multiple learning algorithms (e.g.
And whilst there are very large databases such as those provided by Modgraph 13C NMRPredict, it will be impossible to cover the full chemical universe.īut what if different prediction algorithms, trained with different data sets, are used together? In principle, one might expect that any deficiency of one of the available predictors can be compensated by any of the other predictors. In summary, the accuracy of these fast NMR predictors depends, to a greater or lesser extent, on the contents of the assigned database. On the other hand, whilst Machine Learning (ML) methods are known to show a higher extrapolation/interpolation power, the accuracy of the prediction will nevertheless be compromised to some extent by the similarity of the chemical environment of the atom to be predicted as compared to the data set used to train the ML model. There is therefore an effectively endless frontier. MW up to 500), the best guess for the number of plausible compounds is around 10 60. And if we consider larger molecules (e.g. For example, there are more than 166 billion organic molecules up to 17 atoms of C, N, O, S, and halogens, sizes compatible with many drugs. However, no matter how large the database used, it is going to be extremely tiny compared to the actual chemical space (i.e. The first two methods perform quite accurately as long as the predicted atom is well represented in the internal data base. Increments methods (look-up tables of fragments).In a nutshell, we can consider the following fast prediction approaches: Leaving aside slow quantum mechanical (QM) calculations, usually by the gauge-invariant atomic orbital (GIAO), all fast NMR prediction methods use in one form or other databases of assigned data. They have implemented a procedure that selects the "Best" C13 prediction for each atom from the two prediction methods (Neural Network + HOSE).īased on the idea that different predictors can be combined to form a potentially better predictor, we have added more predictors in Mnova 14 which are combined (using a Bayesian approach) with two main objectives: (1) Improve overall prediction accuracy (2) Reduce the number of prediction outliers. For example, in the case of 13C Prediction, Modgraph combines a Neural Network prediction with the so-called HOSE prediction ( ), both developed by Professor Wolfgang Robien of the University of Vienna and recognized as to offer the most reliable C13 NMR predictions for many years. Modgraph pioneered the concept of combining several NMR prediction methods (see, for instance ). Nowadays, when Machine and Deep Learning techniques are so popular, it is worth remembering that the predictions commercialized by Modgraph have used Neural Networks (in addition to other methods, vide infra) for more than 25 years already. It is really a privilege to offer prediction capabilities developed by the pioneers and leaders in the field for so many years.
Since the very first release of Mnova, we have been (and still are!) very fortunate to include in the software the prediction of NMR spectra provided by Modgraph Consultants.