{"product_id":"mastering-large-datasets-with-python-john-t-wolohan-9781617296239","title":"Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code","description":"Summary \u003cbr\u003eModern data science solutions need to be clean, easy to read, and scalable. In \u003ci\u003eMastering Large Datasets with Python\u003c\/i\u003e, author J.T. Wolohan teaches you how to take a small project and scale it up using a functionally influenced approach to Python coding. You'll explore methods and built-in Python tools that lend themselves to clarity and scalability, like the high-performing parallelism method, as well as distributed technologies that allow for high data throughput. The abundant hands-on exercises in this practical tutorial will lock in these essential skills for any large-scale data science project. \u003cp\u003e\u003c\/p\u003e \u003cp\u003e\u003c\/p\u003ePurchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. \u003cp\u003e\u003c\/p\u003e About the technology \u003cbr\u003eProgramming techniques that work well on laptop-sized data can slow to a crawl--or fail altogether--when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change. \u003cp\u003e\u003c\/p\u003e About the book \u003cbr\u003e\u003ci\u003eMastering Large Datasets with Python\u003c\/i\u003e teaches you to write code that can handle datasets of any size. You'll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You'll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firmly in place, you'll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3. \u003cp\u003e\u003c\/p\u003e What's inside \u003cbr\u003e \u003cul\u003e \u003cli\u003eAn introduction to the map and reduce paradigm\u003c\/li\u003e \u003cli\u003eParallelization with the multiprocessing module and pathos framework\u003c\/li\u003e \u003cli\u003eHadoop and Spark for distributed computing\u003c\/li\u003e \u003cli\u003eRunning AWS jobs to process large datasets\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003e\u003c\/p\u003e About the reader \u003cbr\u003eFor Python programmers who need to work faster with more data. \u003cp\u003e\u003c\/p\u003e About the author \u003cbr\u003e\u003cb\u003eJ. T. Wolohan\u003c\/b\u003e is a lead data scientist at Booz Allen Hamilton, and a PhD researcher at Indiana University, Bloomington. \u003cp\u003e\u003c\/p\u003e \u003cp\u003e\u003c\/p\u003eTable of Contents: \u003cp\u003e\u003c\/p\u003ePART 1 \u003cp\u003e\u003c\/p\u003e1 ] Introduction \u003cp\u003e\u003c\/p\u003e2 ] Accelerating large dataset work: Map and parallel computing \u003cp\u003e\u003c\/p\u003e3 ] Function pipelines for mapping complex transformations \u003cp\u003e\u003c\/p\u003e4 ] Processing large datasets with lazy workflows \u003cp\u003e\u003c\/p\u003e5 ] Accumulation operations with reduce \u003cp\u003e\u003c\/p\u003e6 ] Speeding up map and reduce with advanced parallelization \u003cp\u003e\u003c\/p\u003ePART 2 \u003cp\u003e\u003c\/p\u003e7 ] Processing truly big datasets with Hadoop and Spark \u003cp\u003e\u003c\/p\u003e8 ] Best practices for large data with Apache Streaming and mrjob \u003cp\u003e\u003c\/p\u003e9 ] PageRank with map and reduce in PySpark \u003cp\u003e\u003c\/p\u003e10 ] Faster decision-making with machine learning and PySpark \u003cp\u003e\u003c\/p\u003ePART 3 \u003cp\u003e\u003c\/p\u003e11 ] Large datasets in the cloud with Amazon Web Services and S3 \u003cp\u003e\u003c\/p\u003e12 ] MapReduce in the cloud with Amazon's Elastic MapReduce\u003cbr\u003e\u003cbr\u003e\u003cb\u003eAuthor:\u003c\/b\u003e John T. Wolohan\u003cbr\u003e\u003cb\u003eISBN-10:\u003c\/b\u003e 1617296236\u003cbr\u003e\u003cb\u003eISBN-13:\u003c\/b\u003e 9781617296239\u003cbr\u003e\u003cb\u003ePublisher:\u003c\/b\u003e Manning Publications\u003cbr\u003e\u003cb\u003eLanguage:\u003c\/b\u003e English\u003cbr\u003e\u003cb\u003ePublished:\u003c\/b\u003e 01\/21\/2020\u003cbr\u003e\u003cb\u003ePages:\u003c\/b\u003e 312\u003cbr\u003e\u003cb\u003eFormat:\u003c\/b\u003e Paperback\u003cbr\u003e\u003cb\u003eWeight:\u003c\/b\u003e 1.16lbs\u003cbr\u003e\u003cb\u003eSize:\u003c\/b\u003e 9.20h x 7.40w x 0.60d","brand":"John T. Wolohan","offers":[{"title":"Paperback","offer_id":43984650174719,"sku":"9781617296239","price":49.99,"currency_code":"USD","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0662\/2982\/9887\/files\/img_3a8d088d-d99d-49a7-a6d2-53ec34f959e3.jpg?v=1683273463","url":"https:\/\/www.whiterainbookhouse.com\/products\/mastering-large-datasets-with-python-john-t-wolohan-9781617296239","provider":"WR Book House","version":"1.0","type":"link"}