The "Goldilocks" perspective:
Spoiler: it involves, to a large extent, Lisp.
In a sense, 20 years after the web smashed the language landscape to pieces, we've come to a place where the criteria for a "fast language" has been nudged forwards over the line that was holding back a lot of techniques that used to be considered "too expensive to use in production" -- say, ubiquitous virtual dispatch, garbage collection, latent types -- became "acceptable costs", culturally.
ArchGDAL.jl
(similar to rasterio
and fiona
)GeoDataFrames.jl
(similar to geopandas
)x = 1
x, typeof(x)
(1,Int64)
x = 1.0
x, typeof(x)
(1.0,Float64)
x = "1.0"
x, typeof(x)
("1.0",ASCIIString)
x = true
x, typeof(x)
(true,Bool)
for x in (Float64(1),
string(1.0),
parse(Float64, "1.0"),
parse(Int, "1"),
Int(true))
@show x, typeof(x)
end
(x,typeof(x)) = (1.0,Float64) (x,typeof(x)) = ("1.0",ASCIIString) (x,typeof(x)) = (1.0,Float64) (x,typeof(x)) = (1,Int64) (x,typeof(x)) = (1,Int64)
for word in ("julia", "at", "map", "time", "boston")
if contains(word, "t")
println("t: $word")
elseif contains(word, "a")
println("a: $word")
else
println("*: $word")
end
end
a: julia t: at a: map t: time t: boston
function printword(word)
if contains(word, "t") println("t: $word")
elseif contains(word, "a") println("a: $word")
else println("*: $word")
end
end
for word in ("julia", "at", "map", "time", "boston")
printword(word)
end
a: julia t: at a: map t: time t: boston
for x in (("julia", "at", "map", "time", "boston"),
["julia", "at", "map", "time", "boston"],
Dict("language" => "julia",
"event" => "maptime",
"location" => "boston"))
@show x
for item in x
println("\t $item")
end
end
x = ("julia","at","map","time","boston") julia at map time boston x = ASCIIString["julia","at","map","time","boston"] julia at map time boston x = Dict("event"=>"maptime","location"=>"boston","language"=>"julia") "event"=>"maptime" "location"=>"boston" "language"=>"julia"
For more, check out the following resources:
work in progress at https://github.com/visr/GDAL.jl
using GDAL
GDAL.allregister() # register the drivers
dataset = GDAL.openex("data/rodents.geojson", GDAL.GDAL_OF_VECTOR, C_NULL, C_NULL, C_NULL)
Ptr{GDAL.GDALDatasetH} @0x00007fd704d161f0
layer = GDAL.datasetgetlayerbyname(dataset, "OGRGeoJSON")
Ptr{GDAL.OGRLayerH} @0x00007fd704d117a0
feature = GDAL.getnextfeature(layer) # first feature
@show GDAL.getfieldasinteger64(feature, 0) # id
@show GDAL.getfieldasstring(feature, 1) # OPEN_DT
GDAL.destroy(feature)
GDAL.getfieldasinteger64(feature,0) = 48 GDAL.getfieldasstring(feature,1) = "11/09/2011 02:54:02 PM"
feature = GDAL.getnextfeature(layer) # second feature
@show GDAL.getfieldasinteger64(feature, 0)
@show GDAL.getfieldasstring(feature, 1)
GDAL.destroy(feature)
GDAL.getfieldasinteger64(feature,0) = 67 GDAL.getfieldasstring(feature,1) = "03/05/2014 08:46:53 AM"
GDAL.close(dataset)
GDAL.destroydrivermanager()
work-in-progress at https://github.com/yeesian/ArchGDAL.jl
import ArchGDAL
@time ArchGDAL.registerdrivers() do
ArchGDAL.read("data/rodents.geojson") do dataset
println(dataset)
end
end
GDAL Dataset (Driver: GeoJSON/GeoJSON) File(s): data/rodents.geojson Number of feature layers: 1 Layer 0: OGRGeoJSON (wkbPoint), nfeatures = 10756 0.520685 seconds (392.96 k allocations: 17.541 MB, 2.62% gc time)
@time ArchGDAL.registerdrivers() do
ArchGDAL.read("data/rodents.geojson") do dataset
for (i,feature) in enumerate(ArchGDAL.getlayer(dataset, 0))
i > 2 && break
println(feature)
end
end
end
Feature (index 0) geom => POINT (index 0) id => 48 (index 1) OPEN_DT => 11/09/2011 02:54:02 PM Feature (index 0) geom => POINT (index 0) id => 67 (index 1) OPEN_DT => 03/05/2014 08:46:53 AM 0.294920 seconds (97.84 k allocations: 4.423 MB)
experiment-in-progress at https://github.com/yeesian/GeoDataFrames.jl
using GeoDataFrames; const GD = GeoDataFrames
neighborhoods = GD.read("data/Boston_Neighborhoods.shp")
DataFrames.head(neighborhoods)
geometry0 | Acres | Name | OBJECTID | SHAPE_area | SHAPE_len | |
---|---|---|---|---|---|---|
1 | Geometry: MULTIPOLYGON (((-71.1259265672231 42.2720044534673 ... 34))) | 1605.56181523 | Roslindale | 1 | 6.99382726723e7 | 53563.9125971 |
2 | Geometry: POLYGON ((-71.104991583 42.3260930173557,-71.10487 ... 557)) | 2519.23531679 | Jamaica Plain | 2 | 1.09737890396e8 | 56349.9371614 |
3 | Geometry: POLYGON ((-71.0904337145803 42.3357612955289,-71.0 ... 289)) | 350.85216018 | Mission Hill | 3 | 1.52831200976e7 | 17918.7241135 |
4 | Geometry: POLYGON ((-71.098108339852 42.3367217099475,-71.09 ... 475)) | 188.61119227 | Longwood Medical Area | 4 | 8.21590353542e6 | 11908.7571476 |
5 | Geometry: POLYGON ((-71.0666286565676 42.3487740128554,-71.0 ... 554)) | 26.53973299 | Bay Village | 5 | 1.15607076939e6 | 4650.63549327 |
6 | Geometry: POLYGON ((-71.0583778032908 42.3498224214285,-71.0 ... 285)) | 15.63984554 | Leather District | 6 | 681271.672013 | 3237.14053697 |
rodents = GD.read("data/rodents.geojson")
DataFrames.head(rodents)
geometry0 | id | OPEN_DT | |
---|---|---|---|
1 | Geometry: POINT (-71.0721 42.3224) | 48 | 11/09/2011 02:54:02 PM |
2 | Geometry: POINT (-71.0617 42.3477) | 67 | 03/05/2014 08:46:53 AM |
3 | Geometry: POINT (-71.0464 42.3357) | 86 | 10/19/2011 03:21:20 PM |
4 | Geometry: POINT (-71.0168 42.3834) | 95 | 05/31/2012 01:37:49 PM |
5 | Geometry: POINT (-71.1059 42.3099) | 96 | 11/20/2015 09:37:19 AM |
6 | Geometry: POINT (-71.0765 42.342) | 186 | 09/20/2012 09:11:27 AM |
function numrodents(neighborhood)
sum([ArchGDAL.contains(neighborhood.ptr, r.ptr) for r in rodents[:geometry0]])
end
@time numrodents(neighborhoods[1,:geometry0])
1.530536 seconds (174.90 k allocations: 5.028 MB, 0.62% gc time)
156
@time neighborhoods[:numrodents] = Int[numrodents(n) for n in neighborhoods[:geometry0]]
DataFrames.head(neighborhoods)
42.293394 seconds (2.75 M allocations: 55.753 MB, 0.03% gc time)
geometry0 | Acres | Name | OBJECTID | SHAPE_area | SHAPE_len | numrodents | |
---|---|---|---|---|---|---|---|
1 | Geometry: MULTIPOLYGON (((-71.1259265672231 42.2720044534673 ... 34))) | 1605.56181523 | Roslindale | 1 | 6.99382726723e7 | 53563.9125971 | 156 |
2 | Geometry: POLYGON ((-71.104991583 42.3260930173557,-71.10487 ... 557)) | 2519.23531679 | Jamaica Plain | 2 | 1.09737890396e8 | 56349.9371614 | 491 |
3 | Geometry: POLYGON ((-71.0904337145803 42.3357612955289,-71.0 ... 289)) | 350.85216018 | Mission Hill | 3 | 1.52831200976e7 | 17918.7241135 | 157 |
4 | Geometry: POLYGON ((-71.098108339852 42.3367217099475,-71.09 ... 475)) | 188.61119227 | Longwood Medical Area | 4 | 8.21590353542e6 | 11908.7571476 | 4 |
5 | Geometry: POLYGON ((-71.0666286565676 42.3487740128554,-71.0 ... 554)) | 26.53973299 | Bay Village | 5 | 1.15607076939e6 | 4650.63549327 | 81 |
6 | Geometry: POLYGON ((-71.0583778032908 42.3498224214285,-71.0 ... 285)) | 15.63984554 | Leather District | 6 | 681271.672013 | 3237.14053697 | 10 |
@time GeoDataFrames.plot(neighborhoods,
label=:Name,
plt = Plots.plot(bg = :white))
[Plots.jl] Initializing backend: pyplot 22.263985 seconds (18.15 M allocations: 793.389 MB, 1.63% gc time)
for reference: https://juliaplots.github.io/backends/
using Plots
plotlyjs()
WARNING: using Plots.PyPlot in module Main conflicts with an existing identifier.
Plots.PlotlyJSBackend()
@time GeoDataFrames.plot(neighborhoods,
label=:Name,
plt = Plots.plot(bg = :white))
Plotly javascript loaded.
To load again call
init_notebook(true)
[Plots.jl] Initializing backend: plotlyjs 6.675909 seconds (6.51 M allocations: 285.039 MB, 1.54% gc time)
@time GeoDataFrames.plot(neighborhoods,
label = :Name,
plt = Plots.plot(bg = :white),
fillvalue = :numrodents,
legend = false)
0.312263 seconds (1.11 M allocations: 55.259 MB, 4.18% gc time)
@time GeoDataFrames.plot(neighborhoods,
label = :Name,
plt = Plots.plot(bg = :white),
fillvalue = :numrodents,
legend = false,
colorgrad = Plots.cgrad(:blues))
0.177945 seconds (1.04 M allocations: 52.231 MB, 6.05% gc time)
For more options: check out the documentation at https://juliaplots.github.io/.
using Interact
pyplot()
@manipulate for cg in [:inferno, :heat, :blues]
GeoDataFrames.plot(neighborhoods,
label = :Name,
plt = Plots.plot(bg = :white),
fillvalue = :numrodents,
legend = false,
colorgrad = Plots.cgrad(cg))
end
If you were to make a change, what might it be?
Source: https://xkcd.com/833/
If you were to make a change, what might it be?
Source: https://twitter.com/MapOfTheWeek/status/588055199719251970
If you were to make a change, what might it be?
Source: http://xkcd.com/1138/ (and discussion)
If you were to make a change, what might it be?
work-in-progress at https://github.com/davidagold/jplyr.jl and https://github.com/yeesian/SQLQuery.jl/tree/parse-expr
using SQLQuery
@sqlquery tbl |> select(a,b)
SELECT a, b FROM tbl
@sqlquery tbl |> filter(a > 100, b == "maptime")
SELECT * FROM tbl WHERE a > 100 AND b == 'maptime'
@sqlquery tbl |> orderby(desc(a))
SELECT * FROM tbl ORDER BY a DESC
you compose "verbs" using "pipes" |>
@sqlquery tbl |>
select(*, result = sqrt(column1, col2), a = min(2,column3)) |>
filter(result > 1000) |>
orderby(desc(a))
SELECT * FROM (SELECT * FROM (SELECT *, sqrt(column1,col2) AS result, min(2,column3) AS a FROM tbl) WHERE result > 1000) ORDER BY a DESC
For more: see https://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html for inspiration.
expression = :(tbl |>
select(*, result = sqrt(column1, col2), a = min(2,column3)) |>
filter(result > 1000) |>
orderby(desc(a)))
:(((tbl |> select(*,result=sqrt(column1,col2),a=min(2,column3))) |> filter(result > 1000)) |> orderby(desc(a)))
dump(SQLQuery._sqlquery(expression))
SQLQuery.OrderbyNode{SQLQuery.FilterNode{SQLQuery.SelectNode{Symbol}}} input: SQLQuery.FilterNode{SQLQuery.SelectNode{Symbol}} input: SQLQuery.SelectNode{Symbol} input: Symbol tbl args: Array(Any,(3,)) 1: Symbol * 2: Expr head: Symbol kw args: Array(Any,(2,)) typ: Any 3: Expr head: Symbol kw args: Array(Any,(2,)) typ: Any args: Array(Expr,(1,)) [:(result > 1000)] args: Array(Any,(1,)) 1: Expr head: Symbol call args: Array(Any,(2,)) 1: Symbol desc 2: Symbol a typ: Any
SQLQuery.translatesql(SQLQuery._sqlquery(expression))
" SELECT *\n FROM (SELECT *\n FROM (SELECT *,\n sqrt(column1,col2) AS result,\n min(2,column3) AS a\n FROM tbl)\n WHERE result > 1000)\nORDER BY a DESC"
macro inspect(args...)
ArchGDAL.registerdrivers() do
ArchGDAL.read("data/test-2.3.sqlite") do dataset
sqlcommand = SQLQuery.translatesql(SQLQuery._sqlquery(args))
ArchGDAL.executesql(dataset, sqlcommand) do results
GeoDataFrames.geodataframe(results)
end
end
end
end
@inspect towns |>
select(*) |>
limit(5)
geometry0 | PK_UID | Name | Peoples | LocalCounc | County | Region | |
---|---|---|---|---|---|---|---|
1 | Geometry: POINT (427002.77 4996361.33) | 1 | Brozolo | 435 | 1 | 0 | 0 |
2 | Geometry: POINT (367470.48 4962414.5) | 2 | Campiglione-Fenile | 1284 | 1 | 0 | 0 |
3 | Geometry: POINT (390084.12 5025551.73) | 3 | Canischio | 274 | 1 | 0 | 0 |
4 | Geometry: POINT (425246.99 5000248.3) | 4 | Cavagnolo | 2281 | 1 | 0 | 0 |
5 | Geometry: POINT (426418.89 4957737.37) | 5 | Magliano Alfieri | 1674 | 1 | 0 | 0 |
@sqlquery towns |>
select(name, peoples) |>
filter(peoples > 350000) |>
orderby(desc(peoples))
SELECT * FROM (SELECT * FROM (SELECT name, peoples FROM towns) WHERE peoples > 350000) ORDER BY peoples DESC
@inspect towns |>
select(name, peoples) |>
filter(peoples > 350000) |>
orderby(desc(peoples))
name | peoples | |
---|---|---|
1 | Roma | 2546804 |
2 | Milano | 1256211 |
3 | Napoli | 1004500 |
4 | Torino | 865263 |
5 | Palermo | 686722 |
6 | Genova | 610307 |
7 | Bologna | 371217 |
8 | Firenze | 356118 |
@sqlquery towns |>
select(ntowns = count(*),
smaller = min(peoples),
bigger = max(peoples),
totalpeoples = sum(peoples),
meanpeoples = sum(peoples) / count(*))
SELECT count(*) AS ntowns, min(peoples) AS smaller, max(peoples) AS bigger, sum(peoples) AS totalpeoples, sum(peoples) / count(*) AS meanpeoples FROM towns
@inspect towns |>
select(ntowns = count(*),
smaller = min(peoples),
bigger = max(peoples),
totalpeoples = sum(peoples),
meanpeoples = sum(peoples) / count(*))
ntowns | smaller | bigger | totalpeoples | meanpeoples | |
---|---|---|---|---|---|
1 | 8101 | 33 | 2546804 | 57006147 | 7036 |
@inspect highways |>
select(PK_UID,
npts = npoint(geometry),
astext(startpoint(geometry)),
astext(endpoint(geometry)),
x(nthpoint(geometry,2)),
y(nthpoint(geometry,2))) |>
orderby(desc(npts))
PK_UID | npts | ST_AsText(ST_StartPoint(geometry)) | ST_AsText(ST_EndPoint(geometry)) | ST_X(ST_PointN(geometry,2)) | ST_Y(ST_PointN(geometry,2)) | |
---|---|---|---|---|---|---|
1 | 774 | 6758 | POINT(632090.156998 4835616.546126) | POINT(663300.737479 4795631.803342) | 632086.0096648833 | 4.835625748753577e6 |
2 | 775 | 5120 | POINT(663292.190654 4795627.307765) | POINT(632085.166691 4835620.171885) | 663295.9924954532 | 4.795626489419861e6 |
3 | 153 | 4325 | POINT(668247.593086 4862272.349444) | POINT(671618.13304 4854179.734158) | 668232.5292849538 | 4.862273561966714e6 |
4 | 205 | 3109 | POINT(671613.424233 4854121.472532) | POINT(654264.259259 4855357.41189) | 671610.5236143031 | 4.854129554368173e6 |
5 | 773 | 2755 | POINT(619601.675367 4855174.599496) | POINT(668724.797158 4862015.941886) | 619593.7115396853 | 4.855174743988363e6 |
6 | 767 | 2584 | POINT(669230.644526 4861399.656095) | POINT(656778.219794 4841754.820045) | 669235.2923171917 | 4.861388341674283e6 |
7 | 207 | 2568 | POINT(654262.489833 4855356.779528) | POINT(671604.674669 4854161.831221) | 654264.1975197266 | 4.8553668330274895e6 |
8 | 151 | 2333 | POINT(678698.183542 4835739.644472) | POINT(671608.752851 4854176.222572) | 678649.2699053766 | 4.835784869429852e6 |
9 | 149 | 2206 | POINT(671618.220589 4854185.937448) | POINT(678650.789552 4835773.241197) | 671629.6233445867 | 4.854190205654052e6 |
10 | 769 | 2200 | POINT(657836.87225 4842388.82151) | POINT(668753.698596 4861983.767314) | 657846.6346508698 | 4.842390798813466e6 |
11 | 364 | 2057 | POINT(664216.052197 4910551.772908) | POINT(667841.057054 4863143.709969) | 664158.7224665079 | 4.9105169433443e6 |
12 | 747 | 1992 | POINT(642334.928386 4910026.621566) | POINT(652123.994165 4868904.786248) | 642336.6392315791 | 4.910026658828413e6 |
13 | 492 | 1915 | POINT(778873.993104 4885381.495322) | POINT(697280.883854 4850452.857642) | 778875.6107259771 | 4.885383951525334e6 |
14 | 698 | 1879 | POINT(1075657.324742 4665962.507322) | POINT(1088521.809982 4641907.256511) | 1.0756515959733103e6 | 4.665966841282304e6 |
15 | 770 | 1834 | POINT(671524.375018 4854054.315521) | POINT(671605.495811 4854157.963121) | 671511.1339270872 | 4.854045698825569e6 |
16 | 541 | 1805 | POINT(768687.708934 4875800.342242) | POINT(727246.015043 4847936.331512) | 768668.9266934418 | 4.87588074814125e6 |
17 | 414 | 1794 | POINT(625250.079541 4789515.994231) | POINT(668731.310305 4848638.152835) | 625323.0222187014 | 4.789529259853636e6 |
18 | 488 | 1794 | POINT(625250.079541 4789515.994231) | POINT(668731.310305 4848638.152835) | 625323.0222187014 | 4.789529259853636e6 |
19 | 537 | 1727 | POINT(696416.823534 4849747.49765) | POINT(752357.078757 4867509.169718) | 696416.8938116762 | 4.849745114356477e6 |
20 | 495 | 1678 | POINT(695366.937305 4849890.749257) | POINT(745339.690695 4839447.009088) | 695334.4064278378 | 4.849880253898576e6 |
21 | 772 | 1637 | POINT(647197.376462 4862795.32992) | POINT(619427.347367 4855345.192365) | 647179.858108128 | 4.862792628927675e6 |
22 | 497 | 1622 | POINT(745213.378733 4839490.103422) | POINT(794928.023983 4851103.971745) | 745228.2683560448 | 4.839462011895631e6 |
23 | 290 | 1599 | POINT(771225.986098 4823572.33269) | POINT(687099.41214 4823942.325964) | 771227.6657156587 | 4.823574847990383e6 |
24 | 130 | 1521 | POINT(726585.57513 4775073.367238) | POINT(671429.902096 4854168.307884) | 726587.1745714284 | 4.775130678919593e6 |
25 | 37 | 1482 | POINT(668539.140339 4858263.038813) | POINT(629686.499389 4802622.574554) | 668539.0191625049 | 4.858267814741329e6 |
26 | 766 | 1470 | POINT(676469.53734 4849480.120832) | POINT(671602.595908 4854175.191147) | 676468.9801308603 | 4.849474041332171e6 |
27 | 717 | 1457 | POINT(1070154.669552 4450928.712064) | POINT(1027109.998032 4461662.354314) | 1.0701546695516466e6 | 4.450928712063506e6 |
28 | 676 | 1452 | POINT(597074.183727 4906394.152034) | POINT(610608.952173 4856418.991794) | 597072.8966231112 | 4.906365525872452e6 |
29 | 424 | 1446 | POINT(665274.669503 4847056.310631) | POINT(654847.253359 4785649.133626) | 665122.7350011045 | 4.847119316587441e6 |
30 | 763 | 1439 | POINT(685560.007889 4847315.867006) | POINT(738015.356591 4883984.372051) | 685624.4547433846 | 4.847298580681927e6 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
@sqlquery highways |>
select(PK_UID,
npts = npoint(geometry),
astext(startpoint(geometry)),
astext(endpoint(geometry)),
x(nthpoint(geometry,2)),
y(nthpoint(geometry,2))) |>
orderby(desc(npts))
SELECT * FROM (SELECT PK_UID, ST_NumPoints(geometry) AS npts, ST_AsText(ST_StartPoint(geometry)), ST_AsText(ST_EndPoint(geometry)), ST_X(ST_PointN(geometry,2)), ST_Y(ST_PointN(geometry,2)) FROM highways) ORDER BY npts DESC
http://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview-and-challenges/fulltext
Learn tools, and use tools, but don't accept tools. Always distrust them; always be alert for alternative ways of thinking. This is what I mean by avoiding the conviction that you "know what you're doing". -- Bret Victor, The Future of Programming
Encourage common/shared resources across multiple languages (Matlab, R, Python, Julia, etc):