Advertisement
PeachLemonade

Airflow_spark_dag

Mar 14th, 2024 (edited)
101
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. fhmfl12cg9vubrh80mui.auto.internal
  2. *** Found local files:
  3. ***   * /lessons/logs/dag_id=sparkoperator_demo/run_id=manual__2024-03-14T08:58:37.174475+00:00/task_id=spark_submit_task/attempt=1.log
  4. [2024-03-14, 08:58:37 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: sparkoperator_demo.spark_submit_task manual__2024-03-14T08:58:37.174475+00:00 [queued]>
  5. [2024-03-14, 08:58:37 UTC] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: sparkoperator_demo.spark_submit_task manual__2024-03-14T08:58:37.174475+00:00 [queued]>
  6. [2024-03-14, 08:58:37 UTC] {taskinstance.py:1308} INFO - Starting attempt 1 of 1
  7. [2024-03-14, 08:58:37 UTC] {taskinstance.py:1327} INFO - Executing <Task(SparkSubmitOperator): spark_submit_task> on 2024-03-14 08:58:37.174475+00:00
  8. [2024-03-14, 08:58:37 UTC] {standard_task_runner.py:57} INFO - Started process 7696 to run task
  9. [2024-03-14, 08:58:37 UTC] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'sparkoperator_demo', 'spark_submit_task', 'manual__2024-03-14T08:58:37.174475+00:00', '--job-id', '52', '--raw', '--subdir', 'DAGS_FOLDER/spark_dag.py', '--cfg-path', '/tmp/tmpno54b_ek']
  10. [2024-03-14, 08:58:37 UTC] {standard_task_runner.py:85} INFO - Job 52: Subtask spark_submit_task
  11. [2024-03-14, 08:58:37 UTC] {task_command.py:410} INFO - Running <TaskInstance: sparkoperator_demo.spark_submit_task manual__2024-03-14T08:58:37.174475+00:00 [running]> on host fhmfl12cg9vubrh80mui.auto.internal
  12. [2024-03-14, 08:58:38 UTC] {taskinstance.py:1545} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='sparkoperator_demo' AIRFLOW_CTX_TASK_ID='spark_submit_task' AIRFLOW_CTX_EXECUTION_DATE='2024-03-14T08:58:37.174475+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2024-03-14T08:58:37.174475+00:00'
  13. [2024-03-14, 08:58:38 UTC] {base.py:73} INFO - Using connection ID 'yarn_spark' for task execution.
  14. [2024-03-14, 08:58:38 UTC] {spark_submit.py:341} INFO - Spark-Submit cmd: spark-submit --master yarn --conf spark.driver.maxResultSize=20g --executor-cores 2 --executor-memory 2g --name arrow-spark /lessons/partition.py 2020-05-01 /user/master/data/events /user/kotlyarovb/data/events
  15. [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Class path contains multiple SLF4J bindings.
  16. [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  17. [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  18. [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  19. [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  20. [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:39,535 WARN util.Utils: Your hostname, fhmfl12cg9vubrh80mui resolves to a loopback address: 127.0.1.1; using 172.16.0.24 instead (on interface eth0)
  21. [2024-03-14, 08:58:39 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:39,536 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
  22. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,805 INFO spark.SparkContext: Running Spark version 3.0.2
  23. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,857 INFO resource.ResourceUtils: ==============================================================
  24. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,858 INFO resource.ResourceUtils: Resources for spark.driver:
  25. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO -
  26. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,859 INFO resource.ResourceUtils: ==============================================================
  27. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,859 INFO spark.SparkContext: Submitted application: EventsPartitioningJob-2020-05-01
  28. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,926 INFO spark.SecurityManager: Changing view acls to: kotlyarovb
  29. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,926 INFO spark.SecurityManager: Changing modify acls to: kotlyarovb
  30. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,926 INFO spark.SecurityManager: Changing view acls groups to:
  31. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,927 INFO spark.SecurityManager: Changing modify acls groups to:
  32. [2024-03-14, 08:58:40 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:40,927 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(kotlyarovb); groups with view permissions: Set(); users  with modify permissions: Set(kotlyarovb); groups with modify permissions: Set()
  33. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,165 INFO util.Utils: Successfully started service 'sparkDriver' on port 44065.
  34. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,204 INFO spark.SparkEnv: Registering MapOutputTracker
  35. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,241 INFO spark.SparkEnv: Registering BlockManagerMaster
  36. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,257 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
  37. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,257 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
  38. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,292 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat
  39. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,307 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-64908467-2f69-49fa-ab31-24f1aa3616f5
  40. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,332 INFO memory.MemoryStore: MemoryStore started with capacity 265.5 MiB
  41. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,375 INFO spark.SparkEnv: Registering OutputCommitCoordinator
  42. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,507 INFO util.log: Logging initialized @3082ms to org.sparkproject.jetty.util.log.Slf4jLog
  43. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,602 INFO server.Server: jetty-9.4.34.v20201102; built: 2020-11-02T14:15:39.302Z; git: e46af88704a893fc12cb0e3bf46e2c7b48a009e7; jvm 1.8.0_392-8u392-ga-1~20.04-b08
  44. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,629 INFO server.Server: Started @3204ms
  45. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,681 INFO server.AbstractConnector: Started ServerConnector@7885432{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
  46. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,681 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
  47. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,710 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@78039eb4{/jobs,null,AVAILABLE,@Spark}
  48. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,713 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@54095d19{/jobs/json,null,AVAILABLE,@Spark}
  49. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,714 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ba62011{/jobs/job,null,AVAILABLE,@Spark}
  50. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,715 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c97a623{/jobs/job/json,null,AVAILABLE,@Spark}
  51. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,716 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3134a0e{/stages,null,AVAILABLE,@Spark}
  52. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,716 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@12f79b1d{/stages/json,null,AVAILABLE,@Spark}
  53. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,717 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@20851bbb{/stages/stage,null,AVAILABLE,@Spark}
  54. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,718 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53ac6f58{/stages/stage/json,null,AVAILABLE,@Spark}
  55. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,719 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@775dd3de{/stages/pool,null,AVAILABLE,@Spark}
  56. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,720 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b813cf6{/stages/pool/json,null,AVAILABLE,@Spark}
  57. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,720 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f5db1d6{/storage,null,AVAILABLE,@Spark}
  58. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,721 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c3c754b{/storage/json,null,AVAILABLE,@Spark}
  59. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,722 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@12e9e1f0{/storage/rdd,null,AVAILABLE,@Spark}
  60. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,722 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4755928a{/storage/rdd/json,null,AVAILABLE,@Spark}
  61. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,723 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b3eaf80{/environment,null,AVAILABLE,@Spark}
  62. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,724 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@9c64abf{/environment/json,null,AVAILABLE,@Spark}
  63. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,725 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@866e118{/executors,null,AVAILABLE,@Spark}
  64. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,726 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@338f1432{/executors/json,null,AVAILABLE,@Spark}
  65. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,726 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5a450036{/executors/threadDump,null,AVAILABLE,@Spark}
  66. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,727 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2097c661{/executors/threadDump/json,null,AVAILABLE,@Spark}
  67. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,738 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15ef8888{/static,null,AVAILABLE,@Spark}
  68. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,739 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67a239ee{/,null,AVAILABLE,@Spark}
  69. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,740 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@795aa966{/api,null,AVAILABLE,@Spark}
  70. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,741 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5682c239{/jobs/job/kill,null,AVAILABLE,@Spark}
  71. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,742 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c9db4f7{/stages/stage/kill,null,AVAILABLE,@Spark}
  72. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,745 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.16.0.24:4040
  73. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,966 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
  74. [2024-03-14, 08:58:41 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:41,968 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
  75. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,052 INFO client.RMProxy: Connecting to ResourceManager at rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/172.16.0.14:8032
  76. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,279 INFO client.AHSProxy: Connecting to Application History server at rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/172.16.0.14:10200
  77. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,348 INFO yarn.Client: Requesting a new application from cluster with 5 NodeManagers
  78. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,892 INFO conf.Configuration: resource-types.xml not found
  79. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,893 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
  80. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,914 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (37888 MB per container)
  81. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,915 INFO yarn.Client: Will allocate AM container, with 10700 MB memory including 972 MB overhead
  82. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,915 INFO yarn.Client: Setting up container launch context for our AM
  83. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,918 INFO yarn.Client: Setting up the launch environment for our AM container
  84. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,929 INFO yarn.Client: Preparing resources for our AM container
  85. [2024-03-14, 08:58:42 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:42,999 INFO yarn.Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/kotlyarovb/.sparkStaging/application_1692104774102_37394/pyspark.zip
  86. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,194 INFO yarn.Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.9-src.zip -> hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/kotlyarovb/.sparkStaging/application_1692104774102_37394/py4j-0.10.9-src.zip
  87. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,371 INFO yarn.Client: Uploading resource file:/tmp/spark-268a5017-4800-45bc-b8cc-04dcd63c2d45/__spark_conf__7761552592225624222.zip -> hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/kotlyarovb/.sparkStaging/application_1692104774102_37394/__spark_conf__.zip
  88. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing view acls to: kotlyarovb
  89. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing modify acls to: kotlyarovb
  90. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing view acls groups to:
  91. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: Changing modify acls groups to:
  92. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,426 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(kotlyarovb); groups with view permissions: Set(); users  with modify permissions: Set(kotlyarovb); groups with modify permissions: Set()
  93. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,449 INFO yarn.Client: Submitting application application_1692104774102_37394 to ResourceManager
  94. [2024-03-14, 08:58:43 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:43,487 INFO impl.YarnClientImpl: Submitted application application_1692104774102_37394
  95. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:44,491 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
  96. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:44,494 INFO yarn.Client:
  97. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - client token: N/A
  98. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - diagnostics: AM container is launched, waiting for AM container to Register with RM
  99. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - ApplicationMaster host: N/A
  100. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - ApplicationMaster RPC port: -1
  101. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - queue: root.kotlyarovb
  102. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - start time: 1710406723463
  103. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - final status: UNDEFINED
  104. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - tracking URL: http://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net:8088/proxy/application_1692104774102_37394/
  105. [2024-03-14, 08:58:44 UTC] {spark_submit.py:492} INFO - user: kotlyarovb
  106. [2024-03-14, 08:58:45 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:45,496 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
  107. [2024-03-14, 08:58:46 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:46,498 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
  108. [2024-03-14, 08:58:47 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:47,500 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
  109. [2024-03-14, 08:58:48 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:48,503 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
  110. [2024-03-14, 08:58:49 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:49,505 INFO yarn.Client: Application report for application_1692104774102_37394 (state: ACCEPTED)
  111. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,281 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net, PROXY_URI_BASES -> http://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net:8088/proxy/application_1692104774102_37394), /proxy/application_1692104774102_37394
  112. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,507 INFO yarn.Client: Application report for application_1692104774102_37394 (state: RUNNING)
  113. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,508 INFO yarn.Client:
  114. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - client token: N/A
  115. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - diagnostics: N/A
  116. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - ApplicationMaster host: 172.16.0.34
  117. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - ApplicationMaster RPC port: -1
  118. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - queue: root.kotlyarovb
  119. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - start time: 1710406723463
  120. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - final status: UNDEFINED
  121. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - tracking URL: http://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net:8088/proxy/application_1692104774102_37394/
  122. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - user: kotlyarovb
  123. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,509 INFO cluster.YarnClientSchedulerBackend: Application application_1692104774102_37394 has started running.
  124. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,518 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33077.
  125. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,518 INFO netty.NettyBlockTransferService: Server created on 172.16.0.24:33077
  126. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,520 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
  127. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,535 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.16.0.24, 33077, None)
  128. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,539 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.16.0.24:33077 with 265.5 MiB RAM, BlockManagerId(driver, 172.16.0.24, 33077, None)
  129. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,542 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.16.0.24, 33077, None)
  130. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,543 INFO storage.BlockManager: external shuffle service port = 7337
  131. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,544 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.16.0.24, 33077, None)
  132. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,689 INFO ui.ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
  133. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,691 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d523710{/metrics/json,null,AVAILABLE,@Spark}
  134. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,733 INFO history.SingleEventLogFileWriter: Logging events to hdfs:/var/log/spark/apps/application_1692104774102_37394.inprogress
  135. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,967 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
  136. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,967 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
  137. [2024-03-14, 08:58:50 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:50,978 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
  138. [2024-03-14, 08:58:52 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:52,193 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
  139. [2024-03-14, 08:58:56 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:56,382 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 2, script: , vendor: , memory -> name: memory, amount: 2048, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
  140. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,495 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.0.25:51138) with ID 1
  141. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,505 INFO dynalloc.ExecutorMonitor: New executor 1 has registered (new total is 1)
  142. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,564 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
  143. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,707 INFO storage.BlockManagerMasterEndpoint: Registering block manager rc1a-dataproc-d-1njdvt74bdkw9who.mdb.yandexcloud.net:41237 with 1007.8 MiB RAM, BlockManagerId(1, rc1a-dataproc-d-1njdvt74bdkw9who.mdb.yandexcloud.net, 41237, None)
  144. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,782 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/airflow/spark-warehouse').
  145. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,782 INFO internal.SharedState: Warehouse path is 'file:/opt/airflow/spark-warehouse'.
  146. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,795 INFO ui.ServerInfo: Adding filter to /SQL: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
  147. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,796 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6499fcc4{/SQL,null,AVAILABLE,@Spark}
  148. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,797 INFO ui.ServerInfo: Adding filter to /SQL/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
  149. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,798 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@dd9a2c3{/SQL/json,null,AVAILABLE,@Spark}
  150. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,799 INFO ui.ServerInfo: Adding filter to /SQL/execution: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
  151. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,799 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1082fca1{/SQL/execution,null,AVAILABLE,@Spark}
  152. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,800 INFO ui.ServerInfo: Adding filter to /SQL/execution/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
  153. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,801 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b8eab3b{/SQL/execution/json,null,AVAILABLE,@Spark}
  154. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,801 INFO ui.ServerInfo: Adding filter to /static/sql: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
  155. [2024-03-14, 08:58:57 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:57,803 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4eee3791{/static/sql,null,AVAILABLE,@Spark}
  156. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - Traceback (most recent call last):
  157. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/lessons/partition.py", line 25, in <module>
  158. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - main()
  159. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/lessons/partition.py", line 17, in main
  160. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - events = sql.read.json(f"{base_input_path}/date={date}")
  161. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 300, in json
  162. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
  163. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in deco
  164. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - File "<string>", line 3, in raise_from
  165. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - pyspark.sql.utils.AnalysisException: Path does not exist: hdfs://rc1a-dataproc-m-dg5lgqqm7jju58f9.mdb.yandexcloud.net/user/master/data/events/date=2020-05-01;
  166. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,545 INFO spark.SparkContext: Invoking stop() from shutdown hook
  167. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,556 INFO server.AbstractConnector: Stopped Spark@7885432{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
  168. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,558 INFO ui.SparkUI: Stopped Spark web UI at http://172.16.0.24:4040
  169. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,562 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
  170. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,597 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
  171. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,598 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
  172. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,606 INFO cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
  173. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,633 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
  174. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,642 INFO memory.MemoryStore: MemoryStore cleared
  175. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,642 INFO storage.BlockManager: BlockManager stopped
  176. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,651 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
  177. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,655 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
  178. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,663 INFO spark.SparkContext: Successfully stopped SparkContext
  179. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,663 INFO util.ShutdownHookManager: Shutdown hook called
  180. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,664 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-268a5017-4800-45bc-b8cc-04dcd63c2d45
  181. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,667 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-21e5dfbf-8286-42ae-b2b2-f559faffa2e4
  182. [2024-03-14, 08:58:58 UTC] {spark_submit.py:492} INFO - 2024-03-14 08:58:58,669 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-268a5017-4800-45bc-b8cc-04dcd63c2d45/pyspark-bbb1e6d6-49d4-4f78-9644-fd62fa7045cd
  183. [2024-03-14, 08:58:59 UTC] {taskinstance.py:1824} ERROR - Task failed with exception
  184. Traceback (most recent call last):
  185.   File "/usr/local/lib/python3.8/dist-packages/airflow/providers/apache/spark/operators/spark_submit.py", line 157, in execute
  186.     self._hook.submit(self._application)
  187.   File "/usr/local/lib/python3.8/dist-packages/airflow/providers/apache/spark/hooks/spark_submit.py", line 423, in submit
  188.     raise AirflowException(
  189. airflow.exceptions.AirflowException: Cannot execute: spark-submit --master yarn --conf spark.driver.maxResultSize=20g --executor-cores 2 --executor-memory 2g --name arrow-spark /lessons/partition.py 2020-05-01 /user/master/data/events /user/kotlyarovb/data/events. Error code is: 1.
  190. [2024-03-14, 08:58:59 UTC] {taskinstance.py:1345} INFO - Marking task as FAILED. dag_id=sparkoperator_demo, task_id=spark_submit_task, execution_date=20240314T085837, start_date=20240314T085837, end_date=20240314T085859
  191. [2024-03-14, 08:58:59 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 52 for task spark_submit_task (Cannot execute: spark-submit --master yarn --conf spark.driver.maxResultSize=20g --executor-cores 2 --executor-memory 2g --name arrow-spark /lessons/partition.py 2020-05-01 /user/master/data/events /user/kotlyarovb/data/events. Error code is: 1.; 7696)
  192. [2024-03-14, 08:58:59 UTC] {local_task_job_runner.py:225} INFO - Task exited with return code 1
  193. [2024-03-14, 08:58:59 UTC] {taskinstance.py:2653} INFO - 0 downstream tasks scheduled from follow-on schedule check
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement