Prediction of white shrimp harvest: the case of a small shrimp farm in Tenguel, Guayaquil-Ecuador
DOI:
https://doi.org/10.46661/revmetodoscuanteconempresa.3791Keywords:
prediction, harvest, white shrimp Litopenaeus vannamei, statistical learning, cross-validation, MARSAbstract
Shrimp sector in Ecuador is nowadays one of the fastest-growing non-oil sectors towards the international market. In despite of this growth, to our knowledge most of the little producers of shrimps in Ecuador take important operational decisions based upon empirical knowledge, without considering historical data nor any scientific tool. In this work we implement and compare state-of-the-art statistical learning techniques for the prediction of shrimp harvest (in pounds) for a little shrimp farm located in Tenguel, Guayaquil-Ecuador. For this study we used historical information collected by the farm biologist. The data was organized and put into a digital format by the authors. Data from n=35 past harvests, corresponding to 7 cycles of production, were used to train the models. We then made predictions of shrimp harvest for the next two production cycles. We compare Multiple Linear Regression by means of ordinary least squares, CART Regression Tree, Random Forests, Multivariate Adaptive Regression Splines (MARS) and Support Vector Machines (SVM). In our analysis, MARS with no interaction terms allowed, Linear Regression with best subset variable selection and SVM with linear Kernel gave the lowest prediction error estimate by Cross Validation. Their good predictive performance was confirmed with good predictions on the next two production cycles. The use of statistical techniques can be of great help to improve predictions and therefore operational processes of small shrimp farms.
Downloads
References
Alvarado-Espinoza, F. (2016). La comercialización del camarón ecuatoriano en el mercado internacional y su incidencia en la generación de divisas. (Tesis de fin de Máster). Universidad de Guayaquil, Guayaquil.
Atkinson, A.C. (1985). Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Oxford: Clarendon Press.
Beale, E.M.L., Kendall, M.G., & Mann, D.W. (1967). The discarding of variables in multivariate analysis. Biometrika, 54(3/4), 357-366.
Boser, B.E., Guyon, I.M., & Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory COLT '92, 144-152.
Box, G.E.P., Jenkins, G.M., Reinsel, G.C., & Ljung, G.M. (2015). Time series analysis: Forecasting and control (5ta ed.). Hoboken, New Jersey: John Wiley & Sons.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. Nueva York: Wadsworth & Brooks.
Cevallos-Valdiviezo, H., & Van Aelst, S. (2015). Tree-based prediction on incomplete data using imputation or surrogate decisions. Information Sciences, 311, 163-181.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46.
Cook, R.D. (1998). Regression Graphics: Ideas for Studying Regressions through Graphics. Nueva York: John Wiley & Sons.
Cook, R.D., & Weisberg, S. (1982). Residuals and Influence in Regression. Nueva York: Chapman & Hall.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
David, F.N., & Neyman, J. (1938). Extension of the Markoff’s theorem on least squares. Statistical Research Memoirs, 2, 105-116.
Drews-Jr, P., Bauer, M., Machado, K., Puciarelli, P., & Felipe Dumont, L. (2014, octubre). A machine learning approach to predict the pink shrimp harvest in the patos lagoon estuary. KDMILE. Sao Carlos, Brasil.
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., & Vapnik, V. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 28(7), 155-161.
FAO (2018). GLOBEFISH Highlights: A Quarterly Update on World Seafood Markets (1st issue). Descargado de http://www.fao.org/3/I8626EN/i8626en.pdf
Feelders, A. (1999). Handling missing data in trees: Surrogate splits or statistical imputation? Principles of Data Mining and Knowledge Discovery (pp. 329-334). Berlin Heidelberg: Springer.
Fox, J. (1984). Linear Statistical Models and Related Methods, with Applications to Social Research. Nueva York: John Wiley.
Friedman, J.H. (1991). Multivariate adaptive regression splines. Annals of Statistics, 19(1), 1-67.
Furnival, G.M., & Wilson, R.W. (1974). Regressions by leaps and bounds. Technometrics, 16(4), 499-511.
Garcia, S.P., DeLancey, L.B., Almeida, J., & Chapman, R. (2007). Ecoforecasting in real time for commercial fisheries: The Atlantic white shrimp as a case study. Marine Biology, 152, 15-24.
Geisser, S. (1993). Predictive Inference. Nueva York: Chapman & Hall.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1-58.
Grant, W., Matis, J., & Miller, W. (1988). Forecasting commercial harvest of marine shrimp using a Markov chain model. Ecological Modelling, 43(3), 183-193.
Green, M., & Ohlsson, M. (2007, julio). Comparison of standard resampling methods for performance estimation of artificial neural network ensembles. Third International Conference on Computational Intelligence in Medicine and Healthcare. Plymouth, Reino Unido.
Göndör, M., & Bresfelean V. (2012). RepTree and M5P for measuring fiscal policy influences on the Romanian capital market during 2003-2010. International Journal of Mathematics and Computers in Stimulation, 4, 378-386.
Hastie, T., Tibshirani, R., & Friedman, J.H. (2001). The Elements of Statistical Learning. Nueva York: Springer.
Hocking, R. R., & Leslie, R. N. (1967). Selection of the best subset in regression analysis. Technometrics, 9(4), 531-540.
Hoerl, A.E., & Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
Holte, R.C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63-90.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An Introduction to Statistical Learning: With Applications in R. Nueva York: Springer.
Kalekar, P. S. (2004). Time series Forecasting using Holt-Winters Exponential Smoothing. Kanwal Rekhi School of Information Technology. Descargado de https://caohock24.files.wordpress.com/2012/11/04329008_exponentialsmoothing.pdf
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab - an S4 package for kernel methods in R. Journal of Statistical Software, 11(9), 1-20.
Kohavi, R. (1995). The power of decision tables. European Conference on Machine Learning (ECML), 174-189.
Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms (2da ed.). Nueva York: John Wiley.
Kutner, M., Nachtsheim, C., Neter, J., & Li, W. (2004). Applied Linear Statistical Models. Chicago: McGraw-Hill.
Lachenbruch, P.A., & Mickey, M.R. (1968). Estimation of error rates in discriminant analysis. Technometrics, 10(1), 1-11.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18-22.
Louppe, G. (2014). Understanding Random Forests: From Theory to Practice. (Tesis doctoral no publicada). Universidad de Lieja, Lieja.
Lumley, T., & Miller, A. (2017). leaps: Regression Subset Selection. R package version 3.0.
Marquardt, D.W. (1970). Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics, 12(3), 591-612.
McLachlan, G. (1992). Discriminant analysis and statistical pattern recognition. Nueva York: John Wiley.
Milborrow, S., Hastie, T., Tibshirani, R., Miller, A., & Lumley, T. (2018). earth: Multivariate Adaptive Regression Splines. R package version 4.6.3.
Molinaro, A.M., Simon, R., & Pfeiffer, R.M. (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15), 3301-3307.
Mundfrom, D., Smith, M., & Kay, L. (2018). The effect of multicollinearity on prediction in regression models. General Linear Model Journal, 44, 24-28.
Nicovita (1997). Interrelaciones de la temperatura, oxígeno y amoniaco tóxico en el cultivo de camarón en Tumbes. Descargado de https://www.nicovita.com.pe/extranet/Boletines/ago_97_02.pdf
Peña, E.A., & Slate, E. H. (2006). Global validation of linear model assumptions. Journal of the American Statistical Association, 101(473), 341-354.
Plackett, R. L. (1949). A historical note on the method of least squares. Biometrika, 36(3/4), 458-460.
R Core Team. (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
Salmerón-Gómez, R., & Rodríguez-Martínez, E. (2017). Métodos cuantitativos para un modelo de regresión lineal con multicolinealidad. Aplicación a rendimientos de letras del tesoro. Revista de Métodos Cuantitativos para la Economía y la Empresa, 24, 169-189.
Santillán-Lara, X. (2018). La acuacultura del camarón y su impacto sobre el ecosistema de manglar. SPINCAM 3. Descargado de http://www.spincam3.net/data/actividades/2018/marzo/INFORME_TALLER_ECOSISTEMAS_USO_PRESIONES_28MAR2018.pdf
Seal, H.L. (1967). Studies in the history of probability and statistics. xv: The historical development of the gauss linear model. Biometrika, 54(1/2), 1-24.
Stigler, S.M. (1981). Gauss and the invention of least squares. The Annals of Statistics, 9(3), 465-474.
Sujjaviriyasup, T., & Pitiruek, K. (2013). Agricultural product forecasting using machine learning approach. International Journal of Mathematical Analysis, 7(38), 1869-1875.
Theil, H. (1971). Principles of Econometrics. Nueva York: John Wiley.
Therneau, T., & Atkinson, B. (2018). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-13.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.
Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77-95.
Wahba, G. (1990). Spline Models for Observational Data. Montpelier: Capital City Press.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 Journal of Quantitative Methods for Economics and Business Administration
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Submission of manuscripts implies that the work described has not been published before (except in the form of an abstract or as part of thesis), that it is not under consideration for publication elsewhere and that, in case of acceptance, the authors agree to automatic transfer of the copyright to the Journal for its publication and dissemination. Authors retain the authors' right to use and share the article according to a personal or instutional use or scholarly sharing purposes; in addition, they retain patent, trademark and other intellectual property rights (including research data).
All the articles are published in the Journal under the Creative Commons license CC-BY-SA (Attribution-ShareAlike). It is allowed a commercial use of the work (always including the author attribution) and other derivative works, which must be released under the same license as the original work.
Up to Volume 21, this Journal has been licensing the articles under the Creative Commons license CC-BY-SA 3.0 ES. Starting from Volume 22, the Creative Commons license CC-BY-SA 4.0 is used.