Mostly technical stuff with some interesting moments of life

GLIBCXX_3.4.9 Could Not Be Found with Apache Spark

3 comments
If you encounter an error similar to the following, which complains that GLIBCXX_3.4.9 could not be found, while running an application with Apache Spark you can avoid this by switching Spark's compression method from snappy to something such aslzf.
...
Caused by: java.lang.UnsatisfiedLinkError: .../snappy-1.0.5.3-1e2f59f6-8ea3-4c03-87fe-dcf4fa75ba6c-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by.../snappy-1.0.5.3-1e2f59f6-8ea3-4c03-87fe-dcf4fa75ba6c-libsnappyjava.so)
There are a few ways how one can pass configuration options to Spark. The naive way seems to be through command line as,
--conf "spark.io.compression.codec=lzf"
On a side note, you can find what GLIBC versions are available by running strings /usr/lib/libstdc++.so.6 | grep GLIBC
References

3 comments :

  1. Great post. I got the exact same problem, but I am using pyspark. Since I don't have sudo permission, I am thinking about work around snappy. Should I just pass the codec configuration in the shell like python count.py --conf "spark.io.compression.codec=lzf"?

    ReplyDelete
    Replies
    1. Thanks. This is a parameter that you need to pass to Spark runtime. You can either set it as an environment variable or passing it as given above when starting Spark. See more info at http://spark.apache.org/docs/1.0.1/configuration.html

      Delete